Lately I've been using Virtuoso for running some SPARQL. Here is my quick setup.

I also provide a custom configuration file (for machines with larger memory), the setup for working with a RAM disk (for fast read-only data), a Github gist, and instruction for loading data.

Setup Docker container for Virtuoso

The people at Openlink provided a docker image for the opensource version of their software. So we will pull that, prepare a folder for our data (so that if we kill the container we do not lose the database) and a folder for the data to be imported. I also provide a customized virtuoso.ini file.

docker pull openlink/virtuoso_opensource:latest

mkdir -p database
cp  virtuoso.ini.example database/virtuoso.ini

mkdir -p import

We run the container setting the database and import folders as volumes, here the container is named vos. Note that I do not use --rm so that I can restart the container if I want, you can add --rm and then the container will be removed automatically when it dies.

docker run --name vos -d -v `pwd`/database:/opt/virtuoso-opensource/database \
           -v `pwd`/import:/import \
           -t -p 1111:1111 -p 8890:8890 -i openlink/virtuoso_opensource:latest

The commands above require a custom virtuoso.ini file (provided here). The main edits are based on my need to query a large dataset and I needed to process large resultsets. More information on the parameters are found on the official documentation.

My edits below are for a machine with ~64GB of RAM, and may not be optimal in general, so YMMV.

  1. Allow the /import folder where to put our files to be imported

    DirsAllowed       = ., /opt/virtuoso-opensource/vad, /import
  2. Change memory size thresholds: uncomment the following lines, and comment below the corresponding two (comment with ;)

    NumberOfBuffers  = 4000000
    MaxDirtyBuffers  = 3000000
    ;NumberOfBuffers = 10000
    ;MaxDirtyBuffers = 6000

    few lines earlier you may want to change also

    MaxQueryMem    = 4G       ; memory allocated to query processor
    VectorSize     = 2000     ; initial parallel query vector (array of query operations) size
    MaxVectorSize  = 20000000 ; query vector size threshold.
  1. Longer keep alive for large queries

    KeepAliveTimeout    = 30
  2. Allow for larger resultsets

    ResultSetMaxRows            = 50000
    MaxQueryCostEstimationTime  = 0  ; in seconds
    MaxQueryExecutionTime       = 600   ; in seconds

A Gist

The contents of this reamde and of the ini file can be found on this Github gist.

Add a comment there if you have any feedback.

To use a RAM Disk (in the example of size 8GB)

This is in READ ONLY to have faster query performance. All edits will be lost.

sudo mkdir -p /media/ramdisk1
sudo mount -t tmpfs -o size=8192M tmpfs /media/ramdisk1

docker run --name vos -d -v/media/ramdisk1/database:/opt/virtuoso-opensource/database \
           -v `pwd`/import:/import  \
           -t -p 1111:1111 -p 8890:8890 -i openlink/virtuoso_opensource:latest

Run the CLI

docker exec -it vos /opt/virtuoso-opensource/bin/isql

Create graphs

SPARQL create GRAPH <>;

Import data

delete from DB.DBA.load_list;
ld_dir ('/import', 'my_file.ttl', '');
rdf_loader_run ();

Check existing graphs

  GRAPH ?g {?s ?p ?t}