Exploratory search is the new frontier of information consumption as it goes well beyond simple lookups. Information repositories are ubiquitous and grow larger every day, and automated search systems help users find information in such collections.
The topic of my Thesis Dissertation
To extract knowledge from these repositories, the common ``query lookup'' retrieval paradigm accepts a set of specifications (the query) that describes the objects of interest and then collects such objects. Yet, the query lookup retrieval paradigms commonly in use are no more sufficient to support complex information needs, as they can only provide candidate starting points, but do not help the user in expanding their knowledge. To ease access and consumption of rich information repositories, we address the crucial problem of data exploration. Exploratory tasks match the natural need for finding answers to open-ended information needs within an unfamiliar environment.
In particular, in this dissertation, we focus on enabling access to and exploration of rich information graphs. Within businesses, organizations, and among researchers, data is produced in many forms, large volumes, and different contexts. As a consequence of this heterogeneity, many applications find more useful modelling their datasets with the graph model, where information is represented with entities (nodes) and relationships (edges). Those are the data graphs, the graph databases, the knowledge graphs, or more generally information graphs. The richness of their schema and of their content makes it challenging for users to express appropriate queries and retrieve the desired results. Hence, to allow an effective exploration of a graph, we require: (i) an expressive query paradigm, (ii) an intuitive query mechanism, and (iii) an appropriate storage and query processing system. In this work, we address these three requirements.
An exploratory query should be simple enough to avoid complicate declarative languages (such as SQL or SPARQL), and at the same time, it should retain the flexibility and expressiveness of such languages. For this reason, with respect to the query paradigm, we introduce the notion of exemplar queries and propose extensions to handle multiple incomplete examples. An exemplar query is a query method in which the user, or the analyst, circumvents query languages by using examples as input. In particular, the solution we design allows flexible matching in the case of incomplete or partially specified examples.
Moreover, to enable this query paradigm, there is the need for interactive systems that implement an incremental query-constructions mechanism and interactive explorations. To address this need, we study algorithms and implementations based on pseudo-relevance feedback for exemplar query suggestion, along with an in-depth study of their effectiveness.
Finally, as there exist many graph databases, high heterogeneity can be observed in the functionalities and performances of these systems. We provide an exhaustive evaluation methodology and a comprehensive study of the existing systems that allow to understand their capabilities and limitations. In particular, we design a novel micro-benchmarking framework for the assessment of the functionalities of some graph databases among the most prominent in the area and provide detailed insights on their performance.