An Experimental Comparison of Graph Databases

We are witnessing an increasing interest in graph data. The need for efficient and effective storage and querying of such data has led the development of graph databases. Graph databases represent a relatively new technology, and their requirements and specifications are not yet fully understood by everyone. As such, high heterogeneity can be observed in the functionalities and performances of these systems. In this work we provide a comprehensive study of the existing systems in order to understand their capabilities and limitations.

Visit the Project Page

Background

Previous similar efforts have fallen short in providing a complete evaluation of graph databases, and drawing a clear picture on how they compare to each other. We introduce a micro-benchmarking framework for the assessment of the functionalities of the existing systems and provide detailed insights on their performance. We support the broader spectrum of test queries and conduct the evaluations on both synthetic and real data at scales much higher than what has been done so far. We offer a systematic evaluation framework that we have materialized into an evaluation suite. The framework is extensible, allowing the easy inclusion in the evaluation of other datasets, systems or queries.

Graph database is grounded on the concept of graph theory: abstracting data in the form of nodes, edges and properties. Graph database models can be characterized as those where data structures for the schema and instances are modelled as graphs or generalizations of them, and data manipulation is expressed by graph-oriented operations and type constructors.

GDB Test-suite: Code for Testing

The GDB Test-suite uses docker, and the code for running each system and the experiments is released open-source as a GIT Repository at this link

You may freely use this code for research purposes, provided that you properly acknowledge the authors

Graph Data

We disribute here the datasets used in the tests.
Download the files or part of them, they are stored on Google Drive.

The datasets used in the tests are stored in GraphSON format for the versions of the engines supporting Tinkerpop 3.

System using Tinkerpop 2 support instead GraphSON 1.0.

Our datasets can be easily converted to an updated or older version. For an example see our Docker image.

The MiCo Dataset comes from the authors of GraMi
For more details, you can read «GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph». PVLDB, 7(7):517-528, 2014. by Mohammed Elseidy, Ehab Abdelhamid, Spiros Skiadopoulos, and Panos Kalnis.

The Yeast Dataset has been converted from the one transformed in Pajek format by V. Batagelj. The original dataset come from «Topological structure analysis of the protein-protein interaction network in budding yeast». In Nucleic Acids Research, 2003, Vol. 31, No. 9 2443-2450 by Shiwei Sun, Lunjiang Ling, Nan Zhang, Guojie Li and Runsheng Chen.

Moreover you can read about the details of our Freebase ExQ datasets, or you can use our Docker image to generate the LDBC synthetic dataset.

Details on file sizes


File                        Size

datasets.md5                  4K

yeast.json                  1.5M
yeast.json.gz               180K

mico.json                    84M
mico.json.gz                 12M

ldbc.json                   144M
ldbc.json.gz                 13M


freebase_org.json           584M
freebase_org.json.gz         81M

freebase_small.json          87M
freebase_small.json.gz       12M


freebase_medium.json        816M
freebase_medium.json.gz     117M

freebase_large.json         6.3G
freebase_large.json.gz      616M


freebase_test.json            4K
freebase_test.json.gz         4K