An Experimental Comparison of Graph Databases
We are witnessing an increasing interest in graph data. The need for efficient and effective storage and querying of such data has led the development of graph databases. Graph databases represent a relatively new technology, and their requirements and specifications are not yet fully understood by everyone. As such, high heterogeneity can be observed in the functionalities and performances of these systems. In this work we provide a comprehensive study of the existing systems in order to understand their capabilities and limitations.
Background
Previous similar efforts have fallen short in providing a complete evaluation of graph databases, and drawing a clear picture on how they compare to each other. We introduce a micro-benchmarking framework for the assessment of the functionalities of the existing systems and provide detailed insights on their performance. We support the broader spectrum of test queries and conduct the evaluations on both synthetic and real data at scales much higher than what has been done so far. We offer a systematic evaluation framework that we have materialized into an evaluation suite. The framework is extensible, allowing the easy inclusion in the evaluation of other datasets, systems or queries.
Graph database is grounded on the concept of graph theory: abstracting data in the form of nodes, edges and properties. Graph database models can be characterized as those where data structures for the schema and instances are modelled as graphs or generalizations of them, and data manipulation is expressed by graph-oriented operations and type constructors.
GDB Test-suite: Code for Testing
The GDB Test-suite uses docker, and the code for running each system and the experiments is released open-source as a GIT Repository at this link
You may freely use this code for research purposes, provided that you properly acknowledge the authors
Graph Data
We disribute here the datasets used in the tests.
Download the files or part of them, they are stored on
Google Drive.
The datasets used in the tests are stored in GraphSON format for the versions of the engines supporting Tinkerpop 3.
System using Tinkerpop 2 support instead GraphSON 1.0.
Our datasets can be easily converted to an updated or older version. For an example see our Docker image.
The
MiCo Dataset
comes from the
authors of GraMi
For more details, you can read «GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph». PVLDB, 7(7):517-528, 2014. by Mohammed Elseidy, Ehab Abdelhamid, Spiros Skiadopoulos, and Panos Kalnis.
The Yeast Dataset has been converted from the one transformed in Pajek format by V. Batagelj. The original dataset come from «Topological structure analysis of the protein-protein interaction network in budding yeast». In Nucleic Acids Research, 2003, Vol. 31, No. 9 2443-2450 by Shiwei Sun, Lunjiang Ling, Nan Zhang, Guojie Li and Runsheng Chen.
Moreover you can read about the details of our Freebase ExQ datasets, or you can use our Docker image to generate the LDBC synthetic dataset.
Details on file sizes
File Size
datasets.md5 4K
yeast.json 1.5M
yeast.json.gz 180K
mico.json 84M
mico.json.gz 12M
ldbc.json 144M
ldbc.json.gz 13M
freebase_org.json 584M
freebase_org.json.gz 81M
freebase_small.json 87M
freebase_small.json.gz 12M
freebase_medium.json 816M
freebase_medium.json.gz 117M
freebase_large.json 6.3G
freebase_large.json.gz 616M
freebase_test.json 4K
freebase_test.json.gz 4K