An Experimental Comparison of Graph Databases
We are witnessing an increasing interest in graph data. The need for efficient and effective storage and querying of such data has led the development of graph databases. Graph databases represent a relatively new technology, and their requirements and specifications are not yet fully understood by everyone. As such, high heterogeneity can be observed in the functionalities and performances of these systems. In this work we provide a comprehensive study of the existing systems in order to understand their capabilities and limitations.
Previous similar efforts have fallen short in providing a complete evaluation of graph databases, and drawing a clear picture on how they compare to each other. We introduce a micro-benchmarking framework for the assessment of the functionalities of the existing systems and provide detailed insights on their performance. We support the broader spectrum of test queries and conduct the evaluations on both synthetic and real data at scales much higher than what has been done so far. We offer a systematic evaluation framework that we have materialized into an evaluation suite. The framework is extensible, allowing the easy inclusion in the evaluation of other datasets, systems or queries.
Graph database is grounded on the concept of graph theory: abstracting data in the form of nodes, edges and properties. Graph database models can be characterized as those where data structures for the schema and instances are modelled as graphs or generalizations of them, and data manipulation is expressed by graph-oriented operations and type constructors.
GDB Test-suite: Code for Testing
The GDB Test-suite uses docker, and the code for running each system and the experiments is released open-source as a GIT Repository at this link
You may freely use this code for research purposes, provided that you properly acknowledge the authors
The datasets used in the tests are stored in GraphSON format for the versions of the engines supporting Tinkerpop 3. System using Tinkerpop 2 support instead GraphSON 1.0. Our datasets can be easily converted to an updated or older version. For an example see our Docker image.
The MiCo Dataset comes from the authors of GraMi
For more details, you can read «GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph». PVLDB, 7(7):517-528, 2014. by Mohammed Elseidy, Ehab Abdelhamid, Spiros Skiadopoulos, and Panos Kalnis.
The Yeast Dataset has been converted from the one transformed in Pajek format by V. Batagelj. The original dataset come from «Topological structure analysis of the protein-protein interaction network in budding yeast». In Nucleic Acids Research, 2003, Vol. 31, No. 9 2443-2450 by Shiwei Sun, Lunjiang Ling, Nan Zhang, Guojie Li and Runsheng Chen.
Moreover you can read about the details of our Freebase ExQ datasets, or you can use our Docker image to generate the LDBC synthetic dataset.
Details on file sizes
File Size datasets.md5 4K yeast.json 1.5M yeast.json.gz 180K mico.json 84M mico.json.gz 12M ldbc.json 144M ldbc.json.gz 13M freebase_org.json 584M freebase_org.json.gz 81M freebase_small.json 87M freebase_small.json.gz 12M freebase_medium.json 816M freebase_medium.json.gz 117M freebase_large.json 6.3G freebase_large.json.gz 616M freebase_test.json 4K freebase_test.json.gz 4K