Research Project done by Semantic Computing
Research Lab,
Compputer Science Department, North Carolina State
University. June 22, 2009
Retrospective
Interpretation of Keyword Queries on RDF Databases
Abstract: Earlier
efforts on supporting keyword queries on structured databases focused on the
IR-style of simply returning subgraph matches of keywords that users need to
filter through. More recent approaches, particularly in the context of querying
RDF databases, place an emphasis on explicitly “interpreting” or
“structurizing” a keyword query prior to answering it. A key challenge that
must be overcome in this respect is that of dealing with the large number of
candidate query interpretations due to the inherent ambiguity of keyword
queries. Different heuristics, based on statistical and structural properties
of the database have been proposed, but these approaches do not consistently
produce good results. One important direction that remains unexplored is the
use of query history for contextualizing a query, enabling aggressive pruning
of the space of candidate interpretations.
This paper
addresses the problem of retrospectively
interpreting (structurizing) a
keyword query on an RDF database using
information in a query log. It contributes a dynamic cost model for capturing the
relative relevance of schema segments for a given query in a way that
distinguishes current and aging querying contexts. It further contributes a novel indexing technique for finding
and updating the top-K most relevant schema segments that as new queries enter
and older queries “age-out” of querying contexts. Finally, it contributes a context-aware top-K query generation
algorithm that generates a list of K
structured queries that contain the most likely intended interpretations for a
given keyword query. Experimental
results on effectiveness using extended precision and recall measures and
scalability are presented that show very promising results.
Keywords: Keyword Query, RDF, Disambiguation, Query
History, Query Interpretation