Optimizing SPARQL Queries using Shape Statistics

Kashif Rabbani, Matteo Lissandrini, Katja Hose

Download a digital copy (PDF) or visit the official page

The final authenticated version is available online at 10.5441/002/edbt.2021.59.

Abstract:

With the growing popularity of storing data in native RDF, we witness more and more diverse use cases with complex SPARQL queries. As a consequence, query optimization -- and in particular cardinality estimation and join ordering -- becomes even more crucial. Classical methods exploit global statistics covering the entire RDF graph as a whole, which naturally fails to correctly capture correlations that are very common in RDF datasets, which then leads to erroneous cardinality estimations and suboptimal query execution plans. The alternative of trying to capture correlations in a fine-granular manner, on the other hand, results in very costly preprocessing steps to create these statistics. Hence, in this paper we propose shapes statistics, which extend the recent SHACL standard with statistic information to capture the correlation between classes and properties. Our extensive experiments on synthetic and real data show that shapes statistics can be generated and managed with only little overhead without disadvantages in query runtime while leading to noticeable improvements in cardinality estimation.

Cite:

Rabbani, Kashif; Lissandrini, Matteo; and Hose, Katja.
“Optimizing SPARQL Queries using Shape Statistics.”
Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021

March, 2021 https://people.cs.aau.dk/~matteo/images/sparql-logo.png
(505-510).


@inproceedings{DBLP:conf/edbt/RabbaniLH21,
  author    = {Kashif Rabbani and
               Matteo Lissandrini and
               Katja Hose},
  title     = {Optimizing {SPARQL} Queries using Shape Statistics},
  booktitle = {Proceedings of the 24th International Conference on Extending Database
               Technology, {EDBT} 2021, Nicosia, Cyprus, March 23 - 26, 2021},
  pages     = {505--510},
  publisher = {OpenProceedings.org},
  year      = {2021},
  url       = {https://doi.org/10.5441/002/edbt.2021.59},
  doi       = {10.5441/002/edbt.2021.59},
  timestamp = {Thu, 14 Oct 2021 10:06:58 +0200},
  biburl    = {https://dblp.org/rec/conf/edbt/RabbaniLH21.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}