A similarity measure for approximate querying over RDF data

De Virgilio, Roberto; Maccioni, Antonio; Torlone, Riccardo

doi:10.1145/2457317.2457352

"Approximate query answering relies on a similarity measure that evaluates the relevance, for a given query, of a set of data extracted from the underlying database. In the context of graph-modeled data, many methods (such as, subgraph isomorphism, graph edit distance, and maximum common subgraph) have been proposed to face this problem. Unfortunately, they are usually hard to compute and when they are used on RDF data, several drawbacks arise. In this paper, we propose a measure to evaluate the similarity between a (small) graph representing a query and a portion of a (large) graph representing an RDF data set. We show that this measure: (i) can be evaluated in linear time with respect to the size of the given graphs and, (ii) guarantees other interesting properties. In order to show the feasibility of our approach, we have used such similarity measure in a technique for approximate query answering. The technique has been implemented in a prototypical system and a number of experimental results obtained with this system confirm the effectiveness of the proposed measure."

DE VIRGILIO, R., Maccioni, A., Torlone, R. (2013). A similarity measure for approximate querying over RDF data. In Proceeding EDBT '13 Proceedings of the Joint EDBT/ICDT 2013 Workshops (pp.205-213). NEW YORK, NY, USA : Association for Computing Machinery, Inc. (ACM) [10.1145/2457317.2457352].