Matteo Lissandrini, PhD. | RSS FeedPublications and Notes from Matteo Lissandrini's Academic Web Page: Assistant Professor, PhD, Data Exploration, Data Management, Data Science.2023-11-03T12:50:18Zhttps://people.cs.aau.dk/~matteoSHACTOR: Improving the Quality of Large-Scale Knowledge Graphs with Validating Shapes2023-06-18T00:00:00Zhttps://people.cs.aau.dk/publications/demo/2023-sigmod-shactor-demo.html<h1 id="shactor%3A-improving-the-quality-of-large-scale-knowledge-graphs-with-validating-shapes" tabindex="-1">SHACTOR: Improving the Quality of Large-Scale Knowledge Graphs with Validating Shapes</h1>
<h2 id="kashif-rabbani%2C-matteo-lissandrini%2C-and-katja-hose" tabindex="-1">Kashif Rabbani, Matteo Lissandrini, and Katja Hose</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/SIGMOD23-SHACTOR-demo.pdf">PDF</a>)
<!-- or watch <a class="attachment" href="https://www.youtube.com/watch?v=HpW-mgc130o" >the Demo (on YouTube)</a> -->
</p><p>
The final authenticated version is available online at <a href="https://doi.org/10.1145/3555041.3589723">10.1145/3555041.3589723</a>.
</p></section>
<h3 id="this-work-will-be-presented-tue.-june-20th%2C-2023" tabindex="-1">This work will be presented <a href="https://2023.sigmod.org/sigmod_demo_list.shtml">Tue. June 20th, 2023</a></h3>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
We demonstrate SHACTOR, a system for extracting and analyzing validating shapes from very large Knowledge Graphs (KGs).
Shapes represent a specific form of data patterns, akin to schemas for entities.
Standard shape extraction approaches are likely to produce thousands of shapes, and some of those represent spurious constraints extracted due to the presence of erroneous data in the KG.
Given a KG having tens of millions of triples and thousands of classes, SHACTOR parses the KG using our efficient and scalable shapes extraction algorithm and outputs SHACL shapes constraints.
The extracted shapes are further annotated with statistical information regarding their support in the graph, which allows to identify both erroneous and missing triples in the KG.
Hence, SHACTOR can be used to extract, analyze, and clean shape constraints from very large KGs.
Furthermore, it enables the user to also find and correct errors by automatically generating SPARQL queries over the graph to retrieve nodes and facts that are the source of the spurious shapes and to intervene by amending the data.
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Rabbani, Kashif</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Hose, Katja</span>.
</span>
<br>
“<span itemprop="headline name">SHACTOR: Improving the Quality of Large-Scale Knowledge Graphs with Validating Shapes</span>.”
<div class="hidden">
<time datetime="2023-06-18" itemprop="datePublished">June, 2023</time>
</div>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Companion of the 2023 International Conference on Management of Data (SIGMOD’23)</span></em>
</span>
.
</blockquote>
<pre><code class="lang-bibtex">
@inbook{RabbaniSHACTOR23,
author = {Rabbani, Kashif and Lissandrini, Matteo and Hose, Katja},
title = {SHACTOR: Improving the Quality of Large-Scale Knowledge Graphs with Validating Shapes},
year = {2023},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
doi = "10.1145/3555041.3589723"
booktitle = {Companion of the 2023 International Conference on Management of Data},
numpages = {4}
}
</code></pre>
Extraction of Validating Shapes from very large Knowledge Graphs2022-12-01T00:00:00Zhttps://people.cs.aau.dk/publications/journal/2023-vldb-shapes.html<h1 id="extraction-of-validating-shapes-from-very-large-knowledge-graphs" tabindex="-1">Extraction of Validating Shapes from very large Knowledge Graphs</h1>
<h2 id="kashif-rabbani%2C-matteo-lissandrini%2C-katja-hose" tabindex="-1">Kashif Rabbani, Matteo Lissandrini, Katja Hose</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/VLDB23-SHAPES-p773-rabbani.pdf">PDF</a>)
or see <a href="https://people.cs.aau.dk/~matteo/pdf/VLDB23-shapes-poster.pdf">the Poster (PDF)</a>
and <a href="https://people.cs.aau.dk/~matteo/pdf/VLDB23-shapes-slides.pdf"> the Slides (PDF)</a>
</p>
<!-- <p>
You can also <a class="attachment" href="https://people.cs.aau.dk/~matteo/gdb.html">read more about the project</a>.
</p> --><p>
The final authenticated version is available online at <a href="https://doi.org/10.14778/3579075.3579078">10.14778/3579075.3579078</a>.
</p></section>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
Knowledge Graphs (KGs) represent heterogeneous domain knowledge on the Web and within organizations.
There exist shapes constraint languages to define validating shapes to ensure the quality of the data in KGs.
Existing techniques to extract validating shapes often fail to extract complete shapes, are not scalable, and are prone to produce spurious shapes.
To address these shortcomings, we propose the Quality Shapes Extraction (QSE) approach to extract validating shapes in very large graphs, for which we devise both an exact and an approximate solution.
QSE provides information about the reliability of shape constraints by computing their confidence and support within a KG and in doing so allows to identify shapes that are most informative and less likely to be affected by incomplete or incorrect data.
To the best of our knowledge, QSE is the first approach to extract a complete set of validating shapes from WikiData.
Moreover, QSE provides a 12x reduction in extraction time compared to existing approaches, while managing to filter out up to 93% of the invalid and spurious shapes, resulting in a reduction of up to 2 orders of magnitude in the number of constraints presented to the user, e.g., from 11,916 to 809 on DBpedia.
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Rabbani, Kashif</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Hose, Katja</span>.
</span>
<br>
“<span itemprop="headline name">Extraction of Validating Shapes from very large Knowledge Graphs</span>.”
<br>
<div class="hidden">
<time datetime="2022-12-01" itemprop="datePublished">December, 2022</time>
<span itemprop="image">https://people.cs.aau.dk/~matteo/images/edao-logo.png</span>
</div>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Proceedings of the VLDB Endowment</span></em>
</span>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/PublicationVolume">
<span itemprop="volumeNumber">16</span>
</span>, <span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/PublicationIssue">
<!-- (<span itemprop="issueNumber">4</span>)
(<time datetime="2022-12-01" itemprop="datePublished">December, 2018</time>): -->
</span>
<!-- <span itemprop="pageStart">390</span>-<span itemprop="pageEnd">403</span>. -->
</blockquote>
<!--
<pre><code class="lang-bibtex">
@article{Rabbani:2022:Shapes,
author = {Rabbani, Kashi, and Lissandrini, Matteo and Hose, Katja},
title = {Extraction of Validating Shapes from very large Knowledge Graphs},
journal = {PVLDB},
issue_date = {December 2022},
volume = {16},
number = {4},
month = dec,
year = {2018},
pages = {390–-403},
numpages = {14},
url = {https://doi.org/10.14778/3297753.3297759},
doi = {10.14778/3297753.3297759},
publisher = {VLDB Endowment}
}
</code></pre>
-->Example-Driven Exploratory Analytics over Knowledge Graphs2022-07-15T00:00:00Zhttps://people.cs.aau.dk/publications/conference/2022-edbt-reolap.html<h1 id="example-driven-exploratory-analytics-over-knowledge-graphs" tabindex="-1">Example-Driven Exploratory Analytics over Knowledge Graphs</h1>
<h2 id="matteo-lissandrini%2C-katja-hose%2C-torben-bach-pedersen" tabindex="-1">Matteo Lissandrini, Katja Hose, Torben Bach Pedersen</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/EDBT23-ReOLAP.pdf">PDF</a>)
</p><p>
The final authenticated version is available online at <a href="https://doi.org/10.48786/edbt.2023.09">10.48786/edbt.2023.09</a>.
</p></section>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
Due to their expressive power, Knowledge Graphs (KGs) have received increasing interest not only as means to structure and integrate heterogeneous information but also as a native storage format for large amounts of knowledge and statistical data.
Therefore, analytical queries over KG data, typically stored as RDF, have become increasingly important.
Yet, formulating such queries represents a difficult task for users that are not familiar with the query language (typically SPARQL) and the structure of the dataset at hand.
To overcome this limitation, we propose Re2xOLAP: the first comprehensive interactive approach that allows to reverse-engineer and refine RDF exploratory OLAP queries over KGs containing statistical data.
Thus, Re2xOLAP enables to perform KG exploratory analytics without requiring the user to write any query at all.
We achieve this goal by first reverse-engineering analytical SPARQL queries from a small set of user-provided examples and then, given the reverse-engineered query, we propose intuitive and explainable exploratory query refinements to iteratively help the user obtain the desired information.
Our experiments on real-world large-scale KGs show that Re2xOLAP can efficiently reverse-engineer analytical SPARQL queries solely based on a small set of input examples.
Additionally, we demonstrate the expressive power of our interactive refinement methods by showing that Re2xOLAP allows users to navigate hundreds of thousands of different exploration paths with just a few interactions.
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Hose, Katja</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Pedersen, Torben Bach</span>.
</span>
<br>
“<span itemprop="headline name">Example-Driven Exploratory Analytics over Knowledge Graphs</span>.”
<br>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Proceedings of the 26th International Conference on Extending Database Technology, EDBT 2023</span></em>
</span>
<!-- (<span itemprop="pageStart">447</span>-<span itemprop="pageEnd">450</span>). -->
<div class="hidden">
<time datetime="2022-07-15" itemprop="datePublished">July, 2022</time>
<span itemprop="image">https://people.cs.aau.dk/~matteo/images/edao-logo.png</span>
</div>
</blockquote>
<figure class="body-figure large right">
<img src="https://people.cs.aau.dk/~matteo/images/ReOLAP-workflow.jpg" alt="The Exploratory Workflow: from reverse engineering, to reformulation via similarity search, then disaggregate, then subset">
<figcaption>
The Re2XOLAP approach: example query synthesis and refinement steps, an interaction step is mapped to a pair of arrows.
</figcaption>
</figure>
<pre><code class="lang-bibtex">
@inproceedings{gallo2020personalized,
title={Example-Driven Exploratory Analytics over Knowledge Graphs},
author={Lissandrini, Matteo and Hose, Katja and Pedersen, Torben Bach},
booktitle = {Proceedings of the 26rd International Conference on Extending Database Technology, {EDBT} 2023},
volume={2023},
doi={10.48786/edbt.2023.09},
pages={105--117},
year={2023},
organization={OpenProceedings.org}
}
</code></pre>
Understanding RDF Data Representations in Triplestores2022-06-19T00:00:00Zhttps://people.cs.aau.dk/publications/conference/2022-sebd-rdfstorage.html<h1 id="understanding-rdf-data-representations-in-triplestores" tabindex="-1">Understanding RDF Data Representations in Triplestores</h1>
<h2 id="matteo-lissandrini%2C-tomer-sagi%2C-torben-bach-pedersen%2C-katja-hose" tabindex="-1">Matteo Lissandrini, Tomer Sagi, Torben Bach Pedersen, Katja Hose</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/SEBD22-RDFstorage.pdf">PDF</a>)
</p></section>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
<figure class="body-figure small left">
<img src="https://people.cs.aau.dk/~matteo/images/SCR-space.webp" alt="The SCR Space">
<figcaption>
The SCR system design space for RDF stores: Subdivision, Compression, and Redundancy.
</figcaption>
</figure>
Because of the flexibility and expressiveness of their model, Knowledge Graphs (KGs) have attracted increasing interest.
These resources are usually represented in RDF and stored in specialized data management systems called triplestores.
Yet, while there exists a multitude of such systems, exploiting varying data representation and indexing schemes, it is unclear which of the many design choices are the most effective for a given database and query workload.
Thus, first, we introduce a set of 20 access patterns, which we identify within 6 categories, adopted to analyze the needs of a given query workload.
Then, we identify a novel three-dimensional design space for RDF data representations built on the dimensions of subdivision, redundancy, and compression of data.
This design space maps the trade-offs between different RDF data representations employed to store RDF data within a triplestore.
Thus, each of the required access patterns is compared against its compatibility with a given data representation.
As we show, this approach allows identifying both the most effective RDF data representation for a given query workload as well as unexplored design solutions.
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Sagi, Tomer</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Bach Pedersen, Torben</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Hose, Katja</span>.
</span>
<br>
“<span itemprop="headline name">Understanding RDF Data Representations in Triplestores</span>.”
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Proceedings of the 30th Italian Symposium on Advanced Database Systems, SEBD 2022</span></em>
</span>
<div class="hidden">
<br>
<time datetime="Sun Jun 19 2022 00:00:00 GMT+0000 (Coordinated Universal Time)" itemprop="datePublished">June, 2022</time>
</div>
</blockquote>
<pre><code class="lang-bibtex">
@inproceedings{sebd/Lissandrini22,
author = {Matteo Lissandrini and
Tomer Sagi and
Torben Bach Pedersen and
Katja Hose},
title = {Understanding RDF Data Representations in Triplestores},
booktitle = {30th Italian Symposium on Advanced Database Systems, {SEBD} 2022, Online Proceedings},
publisher = {CEUR-WS.org},
year = {2022}
}
</code></pre>
SHACL and ShEx in the Wild: A Community Survey on Validating Shapes Generation and Adoption2022-03-01T00:00:00Zhttps://people.cs.aau.dk/publications/demo/2022-www-shaclsurvey.html<h1 id="shacl-and-shex-in-the-wild%3A-a-community-survey-on-validating-shapes-generation-and-adoption" tabindex="-1">SHACL and ShEx in the Wild: A Community Survey on Validating Shapes Generation and Adoption</h1>
<h2 id="kashif-rabbani%2C-matteo-lissandrini%2C-katja-hose" tabindex="-1">Kashif Rabbani, Matteo Lissandrini, Katja Hose</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/WebConf22-SHACL-Survey.pdf">PDF</a>) or <a href="https://relweb.cs.aau.dk/validatingshapes/">visit the official project page</a>.
</p></section>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<figure class="body-figure right">
<img src="https://people.cs.aau.dk/~matteo/images/SHACL-ValidatingShapesSurveyChart.png" alt="Result of survey">
<figcaption>
Analysis on extraction of validating shapes, 86% of responders build shapes manually.
</figcaption>
</figure>
<section class="intro secsubead">
Knowledge Graphs (KGs) are widely used to represent heterogeneous domain knowledge on the Web and within organizations. Various methods exist to manage KGs and ensure the quality of their data. Among these, the Shapes Constraint Language (SHACL) and the Shapes Expression Language (ShEx) are the two state-ofthe-art languages to define validating shapes for KGs. Since the usage of these constraint languages has recently increased, new needs arose. One such need is to enable the efficient generation of these shapes. Yet, since these languages are relatively new, we witness a lack of understanding of how they are effectively employed for existing KGs. Therefore, in this work, we answer How validating shapes are being generated and adopted? Our contribution is threefold. First, we conducted a community survey to analyze the needs of users (both from industry and academia) generating validating shapes. Then, we cross-referenced our results with an extensive survey of the existing tools and their features. Finally, we investigated how existing automatic shape extraction approaches work in practice on real, large KGs. Our analysis shows the need for developing semi-automatic methods that can help users generate shapes from large KGs.
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Rabbani, Kashif</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Hose, Katja</span>;
</span>
<br>
“<span itemprop="headline name">SHACL and ShEx in the Wild: A Community Survey on Validating Shapes Generation and Adoption</span>.”
<div class="hidden">
<time datetime="2022-03-01" itemprop="datePublished">March, 2022</time>
</div>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">The Web Conference 2022</span></em>
</span>
<!-- (pp. <span itemprop="pageStart">75</span>-<span itemprop="pageEnd">79</span>). -->
</blockquote>
Panel discussion at the European Big Data Forum2022-01-11T00:00:00Zhttps://people.cs.aau.dk/notes/european-big-data.html<!-- 10-2023 -->
<figure class="show-logo body-figure small left">
<img class="img-thumbnail" alt="The BIG Data Forum logo" src="https://people.cs.aau.dk/~matteo/images/big-data-forum-logo.png">
</figure>
<section class="intro secsubhead">
<p>
What's the vision for the future of discovering and pricing data in Data Marketplaces?
</p>
<p>
I was honored to participate in a very interesting panel discussion organized by <a href="https://www.opertusmundi.eu/">Opertus Mundi</a> at the <a href="https://european-big-data-value-forum.eu/">European Big Data Value Forum</a> (EBDVF).
</p>
</section>
<p>
The panel, chaired by <a href="http://asterios.katsifodimos.com/">Asterios Katsifodimos (Delft)</a> and <a href="https://andraionescu.github.io/">Andra-Denis Ionescu</a>, had the purpose of increasing awareness and discussing the top challenges in discovering and pricing data in data marketplaces.
</p>
<p>
In the panel I joined <a href="https://www.pi.uni-hannover.de/de/dbs/team/abedjan/">Ziawasch Abedjan</a> (Leibniz University Hannover) and <a href="https://velgias.github.io/">Yannis Velegrakis</a> (Utrecht University)
</p>
<figure class="show-logo body-figure small right">
<img class="img-thumbnail" alt="The Search Task" src="https://people.cs.aau.dk/~matteo/images/big-data-forum-image.png">
</figure>
<p>My main message?</p>
<blockquote class="important">
<p>Knowledge Graphs are key enablers to provide proof of relevance.</p>
</blockquote>
<p>
What to know more?
<a href="https://www.opertusmundi.eu/opertus-mundi-closes-2021-with-a-key-panel-discussion-at-the-ebdvf/">Read the full comment on the Opertus Mundi website</a>, and check out my recent work on <a href="https://people.cs.aau.dk/~matteo/notes/edao-announce.html">Example Driven Analytics of Open Knowledge Graphs</a> and <a href="https://people.cs.aau.dk/~matteo/notes/kg-exploration-sigweb-survey.html">Knowledge Graph Exploration</a>.
</p>
Knowledge Graph Exploration Systems: are we lost?2022-01-10T00:00:00Zhttps://people.cs.aau.dk/publications/conference/2022-cidr-graph-exploration.html<h1 id="knowledge-graph-exploration-systems%3A-are-we-lost%3F" tabindex="-1">Knowledge Graph Exploration Systems: are we lost?</h1>
<h2 id="matteo-lissandrini%2C-davide-mottin%2C-katja-hose%2C-torben-bach-pedersen" tabindex="-1">Matteo Lissandrini, Davide Mottin, Katja Hose, Torben Bach Pedersen</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/CIDR22-kg-exploration.pdf">PDF</a>)
or see <a class="attachment" href="https://www.youtube.com/watch?v=IoPzWIXH7kQ">the recorded presentation</a>
</p></section>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
Knowledge graphs (KGs) represent facts in the form of nodes and relationships and are widely used to represent and share knowledge in many different domains.
However, their widespread adoption to integrate different data sources and their generation processes have made KGs very complicated and difficult to understand, leading to the advent of new <em>knowledge graph exploration</em> approaches to better understand their contents and extract relevant insights.
Nevertheless, the needs of current KG exploration use cases are not met (even neglected) by existing KG data management systems.
Hence, the question: are we lost?
We hope not.
Therefore, with the aim of fostering research on these open issues, in this position paper, we first present an overview of state-of-the-art approaches for KG exploration.
Then, we identify the (currently unmet) requirements for effective KG exploration systems, and finally, we highlight promising research directions for the realization of a system able to fully support knowledge graph exploration.
</section>
<figure class="body-figure large">
<img src="https://people.cs.aau.dk/~matteo/images/kg-exploration-system.jpg" alt="A KG Exploration system diagram, describing user interactions, the exploration worklof, and the components divided in query processing, query optimization, and optimized data storage.">
<figcaption>
The core components of a KG Exploration System.
</figcaption>
</figure>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Mottin, Davide</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Hose, Katja</span>.
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Bach Pedersen, Torben</span>.
</span>
<br>
“<span itemprop="headline name">Knowledge Graph Exploration Systems: are we lost?</span>.”
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Proceedings of the 12th Conference on Innovative Data Systems Research, CIDR 2022</span></em>
</span>
<div class="hidden">
<br>
<time datetime="Mon Jan 10 2022 00:00:00 GMT+0000 (Coordinated Universal Time)" itemprop="datePublished">January, 2022</time>
</div>
</blockquote>
<pre><code class="lang-bibtex">
@inproceedings{cidr/Lissandrini22,
author = {Matteo Lissandrini and
Davide Mottin and
Katja Hose and
Torben Bach Pedersen},
title = {Knowledge Graph Exploration Systems: are we lost?},
booktitle = {12th Conference on Innovative Data Systems Research, {CIDR} 2022, Online Proceedings},
publisher = {www.cidrdb.org},
year = {2022}
}
</code></pre>
A core ontology for modeling life cycle sustainability assessment on the Semantic Web2021-12-25T00:00:00Zhttps://people.cs.aau.dk/publications/journal/2021-jie-odas-extended.html<h1 id="a-core-ontology-for-modeling-life-cycle-sustainability-assessment-on-the-semantic-web" tabindex="-1">A core ontology for modeling life cycle sustainability assessment on the Semantic Web</h1>
<h2 id="agneta-ghose%2C-matteo-lissandrini%2C-emil-riis-hansen%2C-bo-pedersen-weidema" tabindex="-1">Agneta Ghose, Matteo Lissandrini, Emil Riis Hansen, Bo Pedersen Weidema</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/JIE-odas-bonsai.pdf">PDF</a>)
</p><p>
The final authenticated version is available online at <a href="https://doi.org/10.1111/jiec.13220">10.1111/jiec.13220</a>.
</p></section>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
The use of Semantic Web and linked data increases the possibility of data accessibility, interpretability, and interoperability. It supports cross-domain data and knowledge sharing and avoids the creation of research data silos. Widely adopted in several research domains, the use of the Semantic Web has been relatively limited with respect to sustainability assessments. A primary barrier is that the framework of the principles and technologies required to link and query data from the Semantic Web is often beyond the scope of industrial ecologists. Linking of a dataset to Semantic Web requires the development of a semantically linked core ontology in addition to the use of existing ontologies. Ontologies provide logical meaning to the data and the possibility to develop machine-readable data format.
To enable and support the uptake of semantic ontologies, we present a core ontology developed specifically to capture the data relevant for life cycle sustainability assessment. We further demonstrate the utility of the ontology by using it to integrate data relevant to sustainability assessments, such as EXIOBASE and the Yale Stocks and Flow Database to the Semantic Web. These datasets can be accessed by the machine-readable endpoint using SPARQL, a semantic query language. The present work provides the foundation necessary to enhance the use of Semantic Web with respect to sustainability assessments. Finally, we provide our perspective on the challenges toward the adoption of Semantic Web technologies and technical solutions that can address these challenges.
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<p>Ghose, Agneta
and Lissandrini, Matteo and Hansen, Emil Riis and Weidema, Bo Pedersen</p>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Agneta Ghose</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Matteo Lissandrini</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Emil Riis Hansen</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Bo Pedersen Weidema</span>;
</span>
<br>
“<span itemprop="headline name">A core ontology for modeling life cycle sustainability assessment on the Semantic Web</span>.”
<br>
<time datetime="2021-12-25" itemprop="datePublished">December, 2021</time>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Journal of Industrial Ecology</span></em>
</span> (<span itemprop="pageStart">1</span>-<span itemprop="pageEnd">17</span>).
</blockquote>
<pre><code class="lang-bibtex">
@article{bonsai:jiec.13220,
author = {Ghose, Agneta and Lissandrini, Matteo and Hansen, Emil Riis and Weidema, Bo Pedersen},
title = {A core ontology for modeling life cycle sustainability assessment on the Semantic Web},
journal = {Journal of Industrial Ecology},
volume = {26},
number = {3},
pages = {731-747},
keywords = {database, industrial ecology, interoperable data, ontology, open data, Semantic Web},
doi = {https://doi.org/10.1111/jiec.13220},
year = {2022}
}
</code></pre>
A design space for RDF data representations2021-12-09T00:00:00Zhttps://people.cs.aau.dk/publications/journal/2022-vldbj-rdfstorage.html<h1 id="a-design-space-for-rdf-data-representations" tabindex="-1">A design space for RDF data representations</h1>
<h2 id="tomer-sagi%2C-matteo-lissandrini%2C-torben-bach-pedersen%2C-katja-hose" tabindex="-1">Tomer Sagi, Matteo Lissandrini, Torben Bach Pedersen, Katja Hose</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/vldbj-rdfstorage.pdf">PDF</a>)
</p><p>
The final authenticated version is available online at <a href="https://doi.org/10.1007/s00778-021-00725-x">10.1007/s00778-021-00725-x</a>.
</p></section>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<figure class="body-figure small left">
<img src="https://people.cs.aau.dk/~matteo/images/SCR-space.webp" alt="The SCR Space">
<figcaption>
The SCR system design space for RDF stores: Subdivision, Compression, and Redundancy.
</figcaption>
</figure>
<section class="intro secsubead">
RDF triplestores’ ability to store and query knowledge bases augmented with semantic annotations has attracted the attention of both research and industry. A multitude of systems offer varying data representation and indexing schemes. However, as recently shown for designing data structures, many design choices are biased by outdated considerations and may not result in the most efficient data representation for a given query workload. To overcome this limitation, we identify a novel three-dimensional design space. Within this design space, we map the trade-offs between different RDF data representations employed as part of an RDF triplestore and identify unexplored solutions. We complement the review with an empirical evaluation of ten standard SPARQL benchmarks to examine the prevalence of these access patterns in synthetic and real query workloads. We find some access patterns, to be both prevalent in the workloads and under-supported by existing triplestores. This shows the capabilities of our model to be used by RDF store designers to reason about different design choices and allow a (possibly artificially intelligent) designer to evaluate the fit between a given system design and a query workload.
</section>
<figure class="body-figure large right">
<img src="https://people.cs.aau.dk/~matteo/images/RDF-design-examples.jpg" alt="Different RDF data representations">
<figcaption>
Examples of data representations: (a) sorted file, (b) hash map, (c) property table, and (d) B+ tree.
</figcaption>
</figure>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Sagi, Tomer</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Bach Pedersen, Torben</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Hose, Katja</span>.
</span>
<br>
“<span itemprop="headline name">A design space for RDF data representations</span>.”
<br>
<div class="hidden">
<span itemprop="image">https://people.cs.aau.dk/~matteo//images/EDAO-logo.png</span>
</div>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">The VLDB Journal</span></em>
</span>, <span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/PublicationVolume"> <span itemprop="volumeNumber">31</span></span>
(<time datetime="2022-01-21" itemprop="datePublished">January, 2022</time>):
<span itemprop="pageStart">347</span>-<span itemprop="pageEnd">373</span>.
</blockquote>
<pre><code class="lang-bibtex">
@article{SagiLPH22,
author = {Tomer Sagi and
Matteo Lissandrini and
Torben Bach Pedersen and
Katja Hose},
title = {A design space for {RDF} data representations},
journal = {VLDB J.},
volume = {31},
number = {2},
pages = {347--373},
year = {2022},
doi = {10.1007/s00778-021-00725-x}
}
</code></pre>
Data Citation and the Citation Graph2021-09-29T00:00:00Zhttps://people.cs.aau.dk/publications/journal/2021-qss-citation-graph.html<h1 id="data-citation-and-the-citation-graph" tabindex="-1">Data Citation and the Citation Graph</h1>
<h2 id="peter-buneman%2C-dennis-dosso%2C-matteo-lissandrini%2C-gianmaria-silvello" tabindex="-1">Peter Buneman, Dennis Dosso, Matteo Lissandrini, Gianmaria Silvello</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/QSS-2021-citation-graph.pdf">PDF</a>)
</p><p>
The final authenticated version is available online at <a href="https://doi.org/10.1162/qss_a_00166">10.1162/qss_a_00166</a>.
</p></section>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
The citation graph is a computational artifact that is widely used to represent the domain of published literature. It represents connections between published works, such as citations and authorship. Among other things, the graph supports the computation of bibliometric measures such as h-indexes and impact factors. There is now an increasing demand that we should treat the publication of data in the same way that we treat conventional publications. In particular, we should cite data for the same reasons that we cite other publications. In this paper, we discuss the current limitations of the citation graph to represent data citation. We identify two critical challenges: to model the evolution of credit appropriately (through references) over time and the ability to model data citation not only for whole datasets (as single objects) but also for parts of them. We describe an extension of the current citation graph model that addresses these challenges. It is built on two central concepts: citable units and reference subsumption. We discuss how this extension would enable data citation to be represented within the citation graph and how it allows for improvements in current practices for bibliometric computations both for scientific publications and for data.
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Buneman, Peter</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Dosso, Dennis</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Silvello, Gianmaria</span>;
</span>
<br>
“<span itemprop="headline name">Data Citation and the Citation Graph</span>.”
<br>
<time datetime="2021-09-29" itemprop="datePublished">September, 2021</time>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Quantitative Science Studies</span></em>
</span> (<span itemprop="pageStart">1399</span>-<span itemprop="pageEnd">1422</span>).
</blockquote>
<pre><code class="lang-bibtex">
@article{10.1162/qss_a_00166,
author = {Buneman, Peter and Dosso, Dennis and Lissandrini, Matteo and Silvello, Gianmaria},
title = "{Data citation and the citation graph}",
journal = {Quantitative Science Studies},
volume = {2},
number = {4},
pages = {1399-1422},
year = {2022},
month = {02},
issn = {2641-3337},
doi = {10.1162/qss_a_00166},
url = {https://doi.org/10.1162/qss\_a\_00166},
eprint = {https://direct.mit.edu/qss/article-pdf/2/4/1399/1986111/qss\_a\_00166.pdf},
}
</code></pre>
Continuing but in a New Role!2021-09-15T00:00:00Zhttps://people.cs.aau.dk/notes/new-role.html<!-- 10-2023 -->
<section class="intro secsubhead">
<p>
I’m very excited to keep working at Aalborg University but now with a new role: I am now working as an Assistant Professor in the Department of Computer Science, at the University of Aalborg.
</p>
<p>
I will continue my research in the field of <b>Data Exploration, Knowledge Graphs, and Graph Data Management.</b>
I will pursue my research goals and also continue teaching all the topics I love and some new one (e.g., Web Data Science!).
</p>
</section>
<figure class="show-logo body-figure">
<img class="img-thumbnail" alt="AAU + Exemplar Search + RDF" src="https://people.cs.aau.dk/~matteo/images/xolap-search.png">
</figure>
<p>
My goal for my research work at Aalborg University in the Database and Web Technologies group is the study of high-performance interactive methods to explore and extract knowledge from heterogeneous data by exploiting knowledge graphs and Graph DBMS.
I believe these methods will be especially useful for supporting web science and data analytics on the Web but also in many other domains.
Moreover, I plan to study method to design and develop a new generation of self-optimizing graph data management systems.
</p>
A foundation for spatio-textual-temporal cube analytics2021-07-11T00:00:00Zhttps://people.cs.aau.dk/publications/journal/2022-infsys-sttolap.html<h1 id="a-foundation-for-spatio-textual-temporal-cube-analytics" tabindex="-1">A foundation for spatio-textual-temporal cube analytics</h1>
<h2 id="mohsin-iqbal%2C-matteo-lissandrini%2C-torben-bach-pedersen" tabindex="-1">Mohsin Iqbal, Matteo Lissandrini, Torben Bach Pedersen</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/vldbj-rdfstorage.pdf">PDF</a>)
</p><p>
The final authenticated version is available online at <a href="https://doi.org/10.1016/j.is.2022.102009">10.1016/j.is.2022.102009</a>.
</p></section>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
Large amounts of spatial, textual, and temporal (STT) data are being produced daily. This is data containing an unstructured component (text), a spatial component (geographic position), and a time component (timestamp). Therefore, there is a need for a powerful and general way of analyzing STT data together. In this paper, we define and formalize the Spatio-Textual-Temporal Cube (STTCube) structure to enable combined effective and efficient analytical queries over STT data.
Our novel data model over STT objects enables novel joint and integrated STT insights that are hard to obtain using existing methods. Furthermore, our proposed STTCube Incremental Maintenance method (IM) maintains the already constructed STTCube efficiently when new data arrives.
Moreover, we introduce the new concept of STT measures with associated novel STT-OLAP operators.
To allow for efficient large-scale analytics, we present a pre-aggregation framework for exact and approximate computation of STT measures. Our comprehensive experimental evaluation on a real-world Twitter dataset confirms that our proposed methods reduce query response time by 1–5 orders of magnitude compared to the No Materialization baseline and decrease storage cost between 97% and 99.9% compared to the Full Materialization baseline while adding only a negligible overhead in the STTCube construction time. Moreover, approximate computation achieves an accuracy between 90% and 100% while reducing query response time by 3–5 orders of magnitude compared to No Materialization and IM achieves an order of magnitude improvement in maintenance time compared to the baseline maintenance method.
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Mohsin Iqbal</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Matteo Lissandrini</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Torben Bach Pedersen</span>;
</span>
<br>
“<span itemprop="headline name">A foundation for spatio-textual-temporal cube analytics</span>.”
<br>
<div class="hidden">
<span itemprop="image">https://people.cs.aau.dk/~matteo//images/EDAO-logo.png</span>
</div>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Information Systems</span></em>
</span>,
<!-- <span itemprop="isPartOf" itemscope itemtype="http://schema.org/PublicationVolume">
<span itemprop="volumeNumber">31</span></span> -->
(<time datetime="2022-01-21" itemprop="datePublished">February, 2022</time>):
<!-- <span itemprop="pageStart">347</span>-<span itemprop="pageEnd">373</span> -->.
</blockquote>
<pre><code class="lang-bibtex">
@article{IQBAL2022102009,
title = {A foundation for spatio-textual-temporal cube analytics},
journal = {Information Systems},
pages = {102009},
year = {2022},
issn = {0306-4379},
doi = {10.1016/j.is.2022.102009},
url = {https://www.sciencedirect.com/science/article/pii/S0306437922000199},
author = {Mohsin Iqbal and Matteo Lissandrini and Torben Bach Pedersen}
}
</code></pre>
SOFOS: Demonstrating the Challenges of Materialized View Selection on Knowledge Graphs2021-06-21T00:00:00Zhttps://people.cs.aau.dk/publications/demo/2021-sigmod-sofos-demo.html<h1 id="sofos%3A-demonstrating-the-challenges-of-materialized-view-selection-on-knowledge-graphs" tabindex="-1">SOFOS: Demonstrating the Challenges of Materialized View Selection on Knowledge Graphs</h1>
<h2 id="georgia-troullinou%2C-haridimos-kondylakis%2C-matteo-lissandrini%2C-davide-mottin" tabindex="-1">Georgia Troullinou, Haridimos Kondylakis, Matteo Lissandrini, Davide Mottin</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/SIGMOD21-SOFOS-demo.pdf">PDF</a>)
or watch <a class="attachment" href="https://www.youtube.com/watch?v=HpW-mgc130o">the Demo (on YouTube)</a>
</p><p>
The final authenticated version is available online at <a href="https://doi.org/10.1145/3448016.3452765">10.1145/3448016.3452765</a>.
</p></section>
<h3 id="this-work-has-been-presented-tue.-june-22nd%2C-2021%2C-online" tabindex="-1">This work has been presented <a href="https://2021.sigmod.org/program/program_tuesday.shtml">Tue. June 22nd, 2021, online</a></h3>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
Analytical queries over RDF data are becoming prominent as a result of the proliferation of knowledge graphs. Yet, RDF databases are not optimized to perform such queries efficiently, leading to long processing times. A well known technique to improve the performance of analytical queries is to exploit materialized views. Although popular in relational databases, view materialization for RDF and SPARQL has not yet transitioned into practice, due to the non-trivial application to the RDF graph model. Motivated by a lack of understanding of the impact of view materialization alternatives for RDF data, we demonstrate SOFOS, a system that implements and compares several cost models for view materialization. SOFOS is, to the best of our knowledge, the first attempt to adapt cost models, initially studied in relational data, to the generic RDF setting, and to propose new ones, analyzing their pitfalls and merits. SOFOS takes an RDF dataset and an analytical query for some facet in the data, and compares and evaluates alternative cost models, displaying statistics and insights about time, memory consumption, and query characteristics.
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Troullinou, Georgia</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Kondylakis, Haridimos</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Mottin, Davide</span>.
</span>
<br>
“<span itemprop="headline name">SOFOS: Demonstrating the Challenges of Materialized View Selection on Knowledge Graphs</span>.”
<div class="hidden">
<time datetime="2021-06-21" itemprop="datePublished">June, 2021</time>
</div>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Proceedings of the 2021 ACM SIGMOD International Conference on Management of Data</span></em>
</span>
(pp. <span itemprop="pageStart">2789</span>-<span itemprop="pageEnd">2793</span>).
</blockquote>
<pre><code class="lang-bibtex">
@inbook{TroullinouKondylakis21,
author = {Troullinou, Georgia and Kondylakis, Haridimos and Lissandrini, Matteo and Mottin, Davide},
title = {SOFOS: Demonstrating the Challenges of Materialized View Selection on Knowledge Graphs},
year = {2021},
isbn = {9781450383431},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3448016.3452765},
doi = "10.1145/3448016.3452765"
booktitle = {Proceedings of the 2021 International Conference on Management of Data},
pages = {2789–2793},
numpages = {5}
}
</code></pre>
LSQB: A Large-Scale Subgraph Query Benchmark2021-06-20T00:00:00Zhttps://people.cs.aau.dk/publications/workshop/2021-sigmod-grades-lsqb.html<h1 id="lsqb%3A-a-large-scale-subgraph-query-benchmark" tabindex="-1">LSQB: A Large-Scale Subgraph Query Benchmark</h1>
<h2 id="amine-mhedhbi%2C-matteo-lissandrini%2C-laurens-kuiper%2C-jack-waudby%2C-gabor-szarnyas" tabindex="-1">Amine Mhedhbi, Matteo Lissandrini, Laurens Kuiper, Jack Waudby, Gabor Szarnyas</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/SIGMOD21-GRADES-lsqb.pdf">PDF</a>)
or get <a class="attachment" href="https://github.com/ldbc/lsqb">the code</a>
</p>
<p>
The final authenticated version is available online at <a href="https://doi.org/10.1145/3461837.3464516">10.1145/3461837.3464516</a>.
</p>
</section>
<h3 id="this-work-has-been-presented-sun.-june-20th%2C-online" tabindex="-1">This work has been presented <a href="https://gradesnda.github.io/">Sun. June 20th, online</a></h3>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
We introduce LSQB, a new large-scale subgraph query benchmark.
LSQB tests the performance of database management systems on an important class of subgraph queries overlooked by existing benchmarks.
Matching a labelled structural graph pattern, referred to as subgraph matching, is the focus of LSQB.
In relational terms, the benchmark tests DBMSs join performance as a choke-point since subgraph matching is equivalent to multi-way joins between base Vertex and base Edge tables on ID attributes.
The benchmark focuses on read-heavy workloads by relying on global queries which have been ignored by prior benchmarks.
Global queries, also referred to as unseeded queries, are a type of queries that are only constrained by labels on the query vertices and edges.
LSQB contains a total of nine queries and leverages the LDBC social network data generator for scalability.
The benchmark gained both academic and industrial interest and is used internally by 5+ different vendors.
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Mhedhbi, Amine</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Kuiper, Laurens</span>.
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Waudby, Jack</span>.
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Szárnyas, Gábor</span>.
</span>
<br>
“<span itemprop="headline name">LSQB: A Large-Scale Subgraph Query Benchmark</span>.”
<div class="hidden">
<time datetime="2021-06-01" itemprop="datePublished">June, 2021</time>
</div>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Proceedings of the 4th Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA). Co-located with SIGMOD 2021</span></em>
</span>
</blockquote>
<pre><code class="lang-bibtex">
@inproceedings{10.1145/3461837.3464516,
author = {Mhedhbi, Amine and Lissandrini, Matteo and Kuiper, Laurens and Waudby, Jack and Sz\'{a}rnyas, G\'{a}bor},
title = {LSQB: A Large-Scale Subgraph Query Benchmark},
year = {2021},
isbn = {9781450384773},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3461837.3464516},
doi = {10.1145/3461837.3464516},
booktitle = {Proceedings of the 4th ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)},
articleno = {8},
numpages = {11},
location = {Virtual Event, China},
series = {GRADES-NDA '21}
}
</code></pre>
Estimating the extent of the effects of data quality through observations2021-04-01T00:00:00Zhttps://people.cs.aau.dk/publications/conference/2021-icde-f4u-short.html<h1 id="estimating-the-extent-of-the-effects-of-data-quality-through-observations" tabindex="-1">Estimating the extent of the effects of data quality through observations</h1>
<h2 id="daniele-foroni%2C-matteo-lissandrini%2C-yannis-velegrakis" tabindex="-1">Daniele Foroni, Matteo Lissandrini, Yannis Velegrakis</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/ICDE21-F4U-short.pdf">PDF</a>)
or see <a class="attachment" href="https://people.cs.aau.dk/~matteo/publications/demo/2021-icde-f4u-demo.html">the demo</a>
</p><p>
The final authenticated version is available online at <a href="https://doi.org/10.1109/ICDE51399.2021.00176">10.1109/ICDE51399.2021.00176</a>.
</p></section>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
Existing data quality works have so far focused on the computation of many data characteristics as a mean of quantifying different quality dimensions, like freshness, consistency, accuracy, or completeness, that are all defined about some ideal (clean) dataset.
We claim that this approach falls short in providing a full specification of the quality of the data since it does not take into consideration the task for which the data is to be used, neither any future instances of the dataset.
We argue that apart from the difference from the clean dataset, it is equally important to know the degree to which such difference affects the results of the task at hand.
Thus, we extend the existing data quality definition to include that degree.
Our approach, not only allows data quality to be considered in the context of the intended task, but can also provide useful information even in the absence of the clean dataset, and proffer an understanding of the effect of data quality in future dataset instances.
We describe a system and its implementation that computes this extended form of data quality through a principled approach of systematic noise generation and task result evaluation.
We perform numerous experiments illustrating the effectiveness of the approach and how this allows contextualizing traditional data quality measures.
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Foroni, Daniele</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Velegrakis, Yannis</span>.
</span>
<br>
“<span itemprop="headline name">Estimating the extent of the effects of data quality through observations</span>.”
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Proceedings of the 37th IEEE International Conference on Data Engineering, ICDE 2021</span></em>
</span>
<div class="hidden">
<br>
<time datetime="Thu Apr 01 2021 00:00:00 GMT+0000 (Coordinated Universal Time)" itemprop="datePublished">April, 2021</time>
</div>
(pp. <span itemprop="pageStart">1913</span>-<span itemprop="pageEnd">1918</span>).
</blockquote>
<pre><code class="lang-bibtex">
@inproceedings{Foroni:F4U,
author = {D. Foroni and M. Lissandrini and Y. Velegrakis},
booktitle = {2021 IEEE 37th International Conference on Data Engineering (ICDE)},
title = {Estimating the extent of the effects of Data Quality through Observations},
year = {2021},
volume = {},
issn = {},
pages = {1913-1918},
keywords = {sensitivity;systematics;frequency modulation;correlation;data integrity;conferences;size measurement},
doi = {10.1109/ICDE51399.2021.00176},
url = {https://doi.ieeecomputersociety.org/10.1109/ICDE51399.2021.00176},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
month = {apr}
}
</code></pre>
The F4U System for Understanding the Effects of Data Quality2021-04-01T00:00:00Zhttps://people.cs.aau.dk/publications/demo/2021-icde-f4u-demo.html<h1 id="the-f4u-system-for-understanding-the-effects-of-data-quality" tabindex="-1">The F4U System for Understanding the Effects of Data Quality</h1>
<h2 id="daniele-foroni%2C-matteo-lissandrini%2C-yannis-velegrakis" tabindex="-1">Daniele Foroni, Matteo Lissandrini, Yannis Velegrakis</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/ICDE21-F4U-demo.pdf">PDF</a>)
or watch <a class="attachment" href="https://people.cs.aau.dk/~matteo/files/F4U_Demo.mp4">the Demo (<code>.mp4</code>)</a>
</p><p>
The final authenticated version is available online at <a href="https://doi.org/10.1109/ICDE51399.2021.00312">10.1109/ICDE51399.2021.00312</a>.
</p></section>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
We demonstrate a system that enables a data-centric approach in understanding data quality.
Instead of directly quantifying data quality as traditionally done, it disrupts the quality of the dataset and monitors the deviations in the output of an analytic task at hand.
It computes the correlation factor between the disruption and the deviation and uses it as the quality metric.
This allows users to understand not only the quality of their dataset but also the effect that present and future quality issues have to the intended analytic tasks.
This is a novel data-centric approach aimed at complementing existing solutions.
On top of the new information that it provides, and in contrast to existing techniques of data quality, it neither requires knowledge of the clean datasets, nor of the constraints on which the data should comply.
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Foroni, Daniele</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Velegrakis, Yannis</span>.
</span>
<br>
“<span itemprop="headline name">The F4U System for Understanding the Effects of Data Quality</span>.”
<div class="hidden">
<time datetime="2021-04-01" itemprop="datePublished">April, 2021</time>
</div>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Proceedings of the 37th IEEE International Conference on Data Engineering, ICDE 2021</span></em>
</span>
(pp. <span itemprop="pageStart">2717</span>-<span itemprop="pageEnd">2720</span>).
</blockquote>
<pre><code class="lang-bibtex">
@INPROCEEDINGS{ForoniLissandrini21,
author={Foroni, Daniele and Lissandrini, Matteo and Velegrakis, Yannis},
booktitle={2021 IEEE 37th International Conference on Data Engineering (ICDE)},
title={The F4U System for Understanding the Effects of Data Quality},
year={2021},
volume={},
number={},
pages={2717-2720},
doi={10.1109/ICDE51399.2021.00312}
}
</code></pre>
Optimizing SPARQL Queries using Shape Statistics2021-03-23T00:00:00Zhttps://people.cs.aau.dk/publications/conference/2021-edbt-shacl-query-optimization.html<h1 id="optimizing-sparql-queries-using-shape-statistics" tabindex="-1">Optimizing SPARQL Queries using Shape Statistics</h1>
<h2 id="kashif-rabbani%2C-matteo-lissandrini%2C-katja-hose" tabindex="-1">Kashif Rabbani, Matteo Lissandrini, Katja Hose</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/EDBT21-shacl-query-optimization.pdf">PDF</a>)
or visit <a class="attachment" href="https://relweb.cs.aau.dk/rdfshapes/">the official page</a>
</p><p>
The final authenticated version is available online at <a href="https://doi.org/10.5441/002/edbt.2021.59">10.5441/002/edbt.2021.59</a>.
</p></section>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
With the growing popularity of storing data in native RDF, we witness more and more diverse use cases with complex SPARQL queries.
As a consequence, query optimization -- and in particular cardinality estimation and join ordering -- becomes even more crucial.
Classical methods exploit global statistics covering the entire RDF graph as a whole, which naturally fails to correctly capture correlations that are very common in RDF datasets, which then leads to erroneous cardinality estimations and suboptimal query execution plans.
The alternative of trying to capture correlations in a fine-granular manner, on the other hand, results in very costly preprocessing steps to create these statistics.
Hence, in this paper we propose <em>shapes statistics</em>, which extend the recent SHACL standard with statistic information to capture the correlation between classes and properties.
Our extensive experiments on synthetic and real data show that shapes statistics can be generated and managed with only little overhead without disadvantages in query runtime while leading to noticeable improvements in cardinality estimation.
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Rabbani, Kashif</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Hose, Katja</span>.
</span>
<br>
“<span itemprop="headline name">Optimizing SPARQL Queries using Shape Statistics</span>.”
<br>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021</span></em>
</span>
<div class="hidden">
<br>
<time datetime="Tue Mar 23 2021 00:00:00 GMT+0000 (Coordinated Universal Time)" itemprop="datePublished">March, 2021</time>
<span itemprop="image">https://people.cs.aau.dk/~matteo/images/sparql-logo.png</span>
</div>
(<span itemprop="pageStart">505</span>-<span itemprop="pageEnd">510</span>).
</blockquote>
<pre><code class="lang-bibtex">
@inproceedings{DBLP:conf/edbt/RabbaniLH21,
author = {Kashif Rabbani and
Matteo Lissandrini and
Katja Hose},
title = {Optimizing {SPARQL} Queries using Shape Statistics},
booktitle = {Proceedings of the 24th International Conference on Extending Database
Technology, {EDBT} 2021, Nicosia, Cyprus, March 23 - 26, 2021},
pages = {505--510},
publisher = {OpenProceedings.org},
year = {2021},
url = {https://doi.org/10.5441/002/edbt.2021.59},
doi = {10.5441/002/edbt.2021.59},
timestamp = {Thu, 14 Oct 2021 10:06:58 +0200},
biburl = {https://dblp.org/rec/conf/edbt/RabbaniLH21.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
</code></pre>
Perspective on Knowledge Graph Exploration: Where Are We and Where Are We Going?2020-07-29T00:00:00Zhttps://people.cs.aau.dk/notes/kg-exploration-sigweb-survey.html<!-- 10-2023 -->
<section class="intro secsubhead">
<dl>
<dt>
<b>Q) What is <em>Knowledge Graph Exploration</em>?</b>
</dt>
<dd>
<b>A)</b> a machine-assisted process of analysis of a KG to:<br>
(1) understand its structure, <br>
(2) identify whether it can satisfy the current information need, and <br>
(3) locate the portion of the KG pertinent to the current need
</dd>
</dl>
</section>
<figure class="body-figure large">
<img src="https://people.cs.aau.dk/~matteo//images/kg-exploration-taxonomy.png" alt="Taxonomy of KG Exploration techniques and their positioning on the spectrum of features. Top KG Exploration, leaves: Summarization/Profiling, Exploratory Analytics, Exploratory search.">
<figcaption>
The taxonomy of KG Exploration techniques that we propose and their mapping on the interactivity / domain-knowledge requirements spectrum.
</figcaption>
</figure>
<p>
In this work we divide techniques in:
</p>
<ul>
<li>Summarization/Profiling</li>
<li>Exploratory Analytics</li>
<li>Exploratory Search</li>
</ul>
<p>
These 3 are mapped on a spectrum from no interaction and no requirement to domain knowledge, to high interaction and higher requirement of domain knowledge.
</p>
<p>
Less interactive methods require less domain knowledge, the system is doing all the work given some general definition of "interestingness". Yet, they also produce less detailed answers.
More interactive approaches require more domain knowledge, but can provide more details.
</p>
<p>
We are actively working on this in <a href="https://people.cs.aau.dk/~matteo/notes/edao-announce.html">the EDAO project</a>.
</p>
<p>
You can <a href="https://people.cs.aau.dk/~matteo/publications/journal/2020-SIGWEB-graph-exploration.html">find out more in our publication on the SIGWEB Newsletter (ed. Summer, 2020)</a>.
</p>
<p>
<a href="https://people.cs.aau.dk/~matteo/publications/conference/2022-CIDR-graph-exploration.html">An extended version of this work</a> has also been presented at CIDR 2022.
</p>
MindReader: Recommendation over Knowledge Graph Entities with Explicit User Ratings2020-07-27T00:00:00Zhttps://people.cs.aau.dk/publications/conference/2020-cikm-mindreader.html<h1 id="mindreader%3A-recommendation-over-knowledge-graph-entities-with-explicit-user-ratings" tabindex="-1">MindReader: Recommendation over Knowledge Graph Entities with Explicit User Ratings</h1>
<h2 id="anders-brams%2C-anders-jakobsen%2C-theis-jendal%2C-matteo-lissandrini%2C-peter-dolog-and-katja-hose" tabindex="-1">Anders Brams, Anders Jakobsen, Theis Jendal, Matteo Lissandrini, Peter Dolog and Katja Hose</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/CIKM20-mindreader.pdf">PDF</a>)
or <a href="https://mindreader.tech/">Test the tool</a>
and <a href="https://mindreader.tech/dataset/">Dowload the data</a>
</p><p>
The final authenticated version is available online at <a href="https://doi.org/10.1145/3340531.3412759">10.1145/3340531.3412759</a>.
</p></section>
<h3 id="abstract-%3A" tabindex="-1">Abstract :</h3>
<section class="intro secsubead">
Knowledge graphs have been integrated in several models of recommendation to augment the informational value of an item by means of its related entities in the graph.
Yet, existing datasets only provide explicit ratings on items and no information is provided about user opinions of other (non-recommendable) entities.
To overcome this limitation, we introduce a new dataset, called the MindReader, providing explicit user ratings both for items and for knowledge graph entities.
In this first version, the MindReader dataset provides more than 102 thousands explicit ratings collected from 1,174 real users on both items and entities from a knowledge graph in the movie domain.
This dataset has been collected through an online interview application that we also release open source.
As a demonstration of the importance of this new dataset, we present a comparative study of the effect of the inclusion of ratings on non-item knowledge graph entities in a variety of state-of-the-art recommendation models.
In particular, we show that most models, whether designed specifically for graph data or not, see improvements in recommendation quality when trained on explicit non-item ratings.
Moreover, for some models, we show that non-item ratings can effectively replace item ratings without loss of recommendation quality.
This finding, thanks also to an observed greater familiarity of users towards common knowledge graph entities than towards long-tail items, motivates the use of knowledge graph entities for both warm and cold-start recommendations.
</section>
<section class="links">
<p>
<a href="https://mindreader.tech/">Test the tool</a>
or <a href="https://mindreader.tech/dataset/">dowload the data</a>.
</p>
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Brams, Anders</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Jakobsen, Anders</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Jendal, Theis</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Dolog, Peter</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Hose, Katja</span>.
</span>
<br>
“<span itemprop="headline name">MindReader: Recommendation over Knowledge Graph Entities with Explicit User Ratings</span>.”
<br>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Proceedings of the Conference on Information and Knowledge Management (CIKM 2020)</span></em>
</span> (pp. <span itemprop="pageStart">2975</span>-<span itemprop="pageEnd">2982</span>).
<div class="hidden">
<time datetime="Mon Jul 27 2020 00:00:00 GMT+0000 (Coordinated Universal Time)" itemprop="datePublished">July, 2020</time>
<span itemprop="image">https://mindreader.tech/favicon.ico</span>
</div>
</blockquote>
<pre><code class="lang-bibtex">
@inproceedings{10.1145/3340531.3412759,
author = {Brams, Anders H. and Jakobsen, Anders L. and Jendal, Theis E. and Lissandrini, Matteo and Dolog, Peter and Hose, Katja},
title = {MindReader: Recommendation over Knowledge Graph Entities with Explicit User Ratings},
year = {2020},
isbn = {9781450368599},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3340531.3412759},
doi = {10.1145/3340531.3412759},
booktitle = {Proceedings of the 29th ACM International Conference on Information & Knowledge Management},
pages = {2975–2982},
numpages = {8},
keywords = {knowledge graph, dataset, recommender systems, content-based filtering, collaborative filtering},
location = {Virtual Event, Ireland},
series = {CIKM '20}
}
</code></pre>
Knowledge Graph Exploration: Where Are We and Where Are We Going?2020-07-27T00:00:00Zhttps://people.cs.aau.dk/publications/journal/2020-sigweb-graph-exploration.html<h1 id="knowledge-graph-exploration%3A-where-are-we-and-where-are-we-going%3F-(invited-article)" tabindex="-1">Knowledge Graph Exploration: Where Are We and Where Are We Going? <small>(Invited Article)</small></h1>
<h2 id="matteo-lissandrini%2C-torben-bach-pedersen%2C-katja-hose%2C-davide-mottin" tabindex="-1">Matteo Lissandrini, Torben Bach Pedersen, Katja Hose, Davide Mottin</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/SIGWEB-2020-kgexp.pdf">PDF</a>)
</p><p>
The final authenticated version is available online at <a href="https://doi.org/10.1145/3409481.3409485">10.1145/3409481.3409485</a>.
</p></section>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
Knowledge graphs (KGs) represent facts in the form of subject-predicate-object triples and are widely used to represent and share knowledge on the Web.
Their ability to represent data in complex domains augmented with semantic annotations has attracted the attention of both research and industry.
Yet, their widespread adoption in various domains and their generation processes have made the contents of these resources complicated.
We speak of <em>knowledge graph exploration</em> as of the gradual discovery and understanding of the contents of a large and unfamiliar KG.
In this paper, we present an overview of the state-of-the-art approaches for KG exploration.
We divide them into three areas: profiling, search, and analysis and we argue that, while KG profiling and KG exploratory search received considerable attention, exploratory KG analytics is still in its infancy.
We conclude with an overview of promising future research directions towards the design of more advanced KG exploration techniques.
</section>
<figure class="body-figure large">
<img src="https://people.cs.aau.dk/~matteo/images/kg-exploration-taxonomy.png" alt="Taxonomy of KG Exploration techniques and their positioning on the spectrum of features. Top KG Exploration, leaves: Summarization/Profiling, Exploratory Analytics, Exploratory search.">
<figcaption>
The taxonomy of KG Exploration techniques that we propose and their mapping on the interactivity/ domain-knowledge requirements spectrum.
</figcaption>
</figure>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Pedersen, Torben Bach</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Hose, Katja</span>.
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Mottin, Davide</span>;
</span>
<br>
“<span itemprop="headline name">Knowledge Graph Exploration: Where Are We and Where Are We Going?</span>.”
<br>
<time datetime="Mon Jul 27 2020 00:00:00 GMT+0000 (Coordinated Universal Time)" itemprop="datePublished">July, 2020</time>
<div class="hidden">
<span itemprop="image">https://people.cs.aau.dk/~matteo/images/xq-logo.png</span>
</div>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">SIGWEB Newsletter</span></em>
</span> (<span itemprop="pageStart">1</span>-<span itemprop="pageEnd">8</span>).
</blockquote>
<pre><code class="lang-bibtex">
article{LissandriniKGE20,
author = {Lissandrini, Matteo and Pedersen, Torben Bach and Hose, Katja and Mottin, Davide},
title = {Knowledge Graph Exploration: Where Are We and Where Are We Going?},
year = {2020},
issue_date = {Summer 2020},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
number = {Summer 2020},
issn = {1931-1745},
url = {https://doi.org/10.1145/3409481.3409485},
doi = {10.1145/3409481.3409485},
journal = {SIGWEB Newsl.},
month = jul,
articleno = {4},
numpages = {8}
}
</code></pre>
Example-based Exploration: Exploring Knowledge through Examples2020-05-31T00:00:00Zhttps://people.cs.aau.dk/publications/tutorial/2020-eswc-exemplar-tutorial.html<h1 id="example-based-exploration%3A-exploring-knowledge-through-examples" tabindex="-1">Example-based Exploration: Exploring Knowledge through Examples</h1>
<h2 id="matteo-lissandrini%2C-davide-mottin%2C-themis-palpanas%2C-yannis-velegrakis" tabindex="-1">Matteo Lissandrini, Davide Mottin, Themis Palpanas, Yannis Velegrakis</h2>
<div class="box-special">
<p>
The content of this tutorial has been expanded in a book: <br><a href="https://people.cs.aau.dk/~matteo/publications/book/2018-mc-exemplar.html" class="call-to-action">Find out more!</a>
</p>
<p>You can also visit the official page on <a href="https://data-exploration.ml/eswc2020.html">data-exploration.ml</a></p>
</div>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
<p>
Exploration is one of the primordial ways to accrue knowledge about the world and its nature. As we accumulate, mostly automatically, data at unprecedented volumes and speed, our datasets have become complex and hard to understand. In this context, exploratory search provides a handy tool for progressively gather the necessary knowledge by starting from a tentative query that can provide cues about the next queries to issue.
</p>
<p>
An exploratory query should be simple enough to avoid complicate declarative languages (such as SQL or SPARQL) and convoluted mechanism, and at the same time retain the flexibility and expressiveness required to express complex information needs. Recently, we have witnessed a rediscovery of the so called example-based methods, in which the user, or the analyst circumvent query languages by using examples as input.
</p>
<p>
This shift in semantics has led to a number of methods receiving as query a set of example members of the answer set. The search system then infers the entire answer set based on the given examples and any additional information provided by the underlying database. In this tutorial, we present an excursus over the main example-based methods for exploratory analysis. We show how different data types require different techniques, and present algorithms that are specifically designed for relational, textual, and graph data. We conclude by providing a unifying view of this query-paradigm and identify new exciting research directions.
</p>
<p>
<b>The tutorial has been presented on June 3, 2020, at ESWC</b>
<a href="https://2020.eswc-conferences.org/">See the official program.</a>
</p>
</section>
<!-- <%= @moment().format('YYYY') %> -->
Transparent Integration and Sharing of Life Cycle Sustainability Data with Provenance2020-04-20T00:00:00Zhttps://people.cs.aau.dk/publications/conference/2020-iswc-bonsai.html<h1 id="transparent-integration-and-sharing-of-life-cycle-sustainability-data-with-provenance" tabindex="-1">Transparent Integration and Sharing of Life Cycle Sustainability Data with Provenance</h1>
<h2 id="emil-riis-hansen%2C-matteo-lissandrini%2C-agneta-ghose%2C-s%C3%B8ren-l%C3%B8kke%2C-christian-thomsen%2C-katja-hose" tabindex="-1">Emil Riis Hansen, Matteo Lissandrini, Agneta Ghose, Søren Løkke, Christian Thomsen, Katja Hose</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/ISWC20-bonsai.pdf">PDF</a>)
</p><p>
The final authenticated version is available online at <a href="https://doi.org/10.1007/978-3-030-62466-8/_24">10.1007/978-3-030-62466-8\_24</a>.
</p></section>
<h3 id="abstract-%3A" tabindex="-1">Abstract :</h3>
<section class="intro secsubead">
Life Cycle Sustainability Analysis (LCSA) studies the complex processes describing product life cycles and their impact on the environment, economy, and society.
Effective and transparent sustainability assessment requires access to data from a variety of heterogeneous sources across countries, scientific and economic sectors, and institutions.
Moreover, given their important role for governments and policymakers, the results of many different steps of this analysis should be made freely available, alongside the information about how they have been computed in order to ensure accountability.
In this paper, we describe how Semantic Web technologies in general and PROV-O in particular, are used to enable transparent sharing and integration of datasets for LCSA.
We describe the challenges we encountered in helping a community of domain experts with no prior expertise in Semantic Web technologies to fully overcome the limitations of their current practice in integrating and sharing open data.
This resulted in the first nucleus of an open data repository of information about global production.
Furthermore, we describe how we enable domain experts to track the provenance of particular pieces of information that are crucial in higher-level analysis.
</section>
<section class="links">
<p>
Additional material is available <a href="https://bonsai.uno/">on the BONSAI community portal</a> and <a href="https://github.com/BONSAMURAIS">GitHub</a>.
</p>
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Hansen, Emil Riis</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Ghose, Agneta</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Løkke, Søren</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Thomsen, Christian</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Hose, Katja</span>.
</span>
<br>
“<span itemprop="headline name">Transparent Integration and Sharing of Life Cycle Sustainability Data with Provenance</span>.”
<br>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Proceedings of International Semantic Web Conference (ISWC 2020)</span></em>
</span> (pp. <span itemprop="pageStart">378</span>-<span itemprop="pageEnd">394</span>).
<div class="hidden">
<time datetime="Mon Apr 20 2020 00:00:00 GMT+0000 (Coordinated Universal Time)" itemprop="datePublished">July, 2020</time>
<span itemprop="image">https://people.cs.aau.dk/~matteo/images/bonsai-logo.png</span>
</div>
</blockquote>
<pre><code class="lang-bibtex">
@inproceedings{HLGLTH:Transparent,
author="Hansen, Emil Riis and Lissandrini, Matteo and Ghose, Agneta
and L{\o}kke, S{\o}ren and Thomsen, Christian and Hose, Katja",
title="Transparent Integration and Sharing of Life Cycle Sustainability Data with Provenance",
booktitle="The Semantic Web -- ISWC 2020",
year="2020",
publisher="Springer International Publishing",
pages="378--394",
isbn="978-3-030-62466-8"
doi = {10.1007/978-3-030-62466-8\_24},
}
</code></pre>
Graph-Query Suggestions for Knowledge Graph Exploration2020-04-20T00:00:00Zhttps://people.cs.aau.dk/publications/conference/2020-webconf-iexq.html<h1 id="graph-query-suggestions-for-knowledge-graph-exploration" tabindex="-1">Graph-Query Suggestions for Knowledge Graph Exploration</h1>
<h2 id="matteo-lissandrini%2C-davide-mottin%2C-themis-palpanas%2C-yannis-velegrakis" tabindex="-1">Matteo Lissandrini, Davide Mottin, Themis Palpanas, Yannis Velegrakis</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/WebConf20-interactive.pdf">PDF</a>)
or watch <a href="https://www.youtube.com/watch?v=LZcrKUTL4GA">the Presentation (YouTube)</a>
and see <a href="https://people.cs.aau.dk/~matteo/pdf/WebConf20-interactive-SLIDES_export.pdf"> the Slides (PDF)</a>
</p><p>
The final authenticated version is available online at <a href="https://doi.org/10.1145/3366423.3380005">10.1145/3366423.3380005</a>.
</p></section>
<h3 id="abstract-%3A" tabindex="-1">Abstract :</h3>
<section class="intro secsubead">
We consider the task of exploratory search through graph queries on knowledge graphs.
We propose to assist the user by expanding the query with <em>intuitive</em> suggestions to provide a more informative (full) query that can retrieve more detailed and relevant answers.
To achieve this result, we propose a model that can bridge graph search paradigms with well-established techniques for information-retrieval.
Our approach does not require any additional knowledge from the user and builds on principled language modelling approaches.
We empirically show the effectiveness and efficiency of our approach on a large knowledge graph and how our suggestions are able to help build more complete and informative queries.
</section>
<!-- ### Presented <a href="https://calendar.google.com/calendar/r/month/2020/4/22?eid=MnExbTdpbzRwZWJjajNyZDBoaGtkcGUxa2ggdmlxcHRoNDJpMmUwcXZ2aDRkMzF0YWZubTRAZw&ctz=Asia/Taipei&sf=true"><time datetime="2020-04-22T12:45:00.000+0800">Wed. April 22nd, 12:45-13:00 (UTC/GMT +8)</time> <b>online</b></a> see <a href="https://www2020.citi.sinica.edu.tw/schedule/research_track/#Search-1">the program</a>
-->
<section class="links">
<p>
Additional material is available: <a href="https://people.cs.aau.dk/~matteo/files/exp-queries.zip">Test Queries</a> and <a href="https://people.cs.aau.dk/~matteo/notes/freebase-data-dump.html">Dataset</a>.
</p>
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Mottin, Davide</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Palpanas, Themis</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Velegrakis, Yannis</span>.
</span>
<br>
“<span itemprop="headline name">Graph-Query Suggestions for Knowledge Graph Exploration</span>.”
<br>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Proceedings of the World Wide Web Conference (WebConf 2020)</span></em>
</span> (pp. <span itemprop="pageStart">2549</span>-<span itemprop="pageEnd">2555</span>).
<div class="hidden">
<time datetime="Mon Apr 20 2020 00:00:00 GMT+0000 (Coordinated Universal Time)" itemprop="datePublished">April, 2020</time>
<span itemprop="image">https://people.cs.aau.dk/~matteo/images/xq-logo.png</span>
</div>
</blockquote>
<pre><code class="lang-bibtex">
@inproceedings{lissandrini2020graph,
title={Graph-Query Suggestions for Knowledge Graph Exploration},
author={Lissandrini, Matteo and Mottin, Davide and Palpanas, Themis and Velegrakis, Yannis},
booktitle={Proceedings of The Web Conference 2020},
pages={2549--2555},
doi = {10.1145/3366423.3380005},
year={2020}
}
</code></pre>
Personalized Page Rank on Knowledge Graphs: Particle Filtering is all you need!2020-03-30T00:00:00Zhttps://people.cs.aau.dk/publications/conference/2020-edbt-particle-filtering.html<h1 id="personalized-page-rank-on-knowledge-graphs%3A-particle-filtering-is-all-you-need!" tabindex="-1">Personalized Page Rank on Knowledge Graphs: Particle Filtering is all you need!</h1>
<h2 id="denis-gallo%2C-matteo-lissandrini%2C-yannis-velegrakis" tabindex="-1">Denis Gallo, Matteo Lissandrini, Yannis Velegrakis</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/EDBT20-particle-filtering.pdf">PDF</a>)
or see <a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/EDBT20-poster.pdf">the Poster (PDF)</a>
</p><p>and</p>
<p>watch <a class="attachment" href="https://youtu.be/Spr6H4IhsKk">the presentation (YouTube)</a></p>
<p></p><p>
The final authenticated version is available online at <a href="https://doi.org/10.5441/002/edbt.2020.54">10.5441/002/edbt.2020.54</a>.
</p></section>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
Graphs are everywhere.
Personalized Page Rank (PPR) is a particularly important task to support search and exploration within such datasets.
PPR computes the proximity between query nodes and other nodes in the graph.
This is used, among others, for entity exploration, query expansion, and product recommendation.
Graph databases are used for storing knowledge graphs.
Unfortunately, the exact computation of PPR is computationally expensive.
While different solutions have been proposed to compute PPR values with high precision, these are extremely complex to implement, and in some cases require heavy pre-processing.
In this work we sustain that a better approach exists: <em>particle filtering</em>.
Particle filtering methods produce ranks with sufficient precision while exploiting what graph databases architectures are already optimized for: navigating local connections.
We present the implementation of such approach in a popular commercial database, and show how this outperforms the already implemented functionality.
With this, we aim to motivate future research to optimize and improve upon this research direction.
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Gallo, Denis</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Velegrakis, Yannis</span>.
</span>
<br>
“<span itemprop="headline name">Personalized Page Rank on Knowledge Graphs: Particle Filtering is all you need!</span>.”
<br>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Proceedings of the 23th International Conference on Extending Database Technology, EDBT 2020</span></em>
</span> (<span itemprop="pageStart">447</span>-<span itemprop="pageEnd">450</span>).
<div class="hidden">
<time datetime="2020-03-30" itemprop="datePublished">March, 2020</time>
<span itemprop="image">https://people.cs.aau.dk/~matteo/images/graph-logo.png</span>
</div>
</blockquote>
<pre><code class="lang-bibtex">
@inproceedings{gallo2020personalized,
title={Personalized page rank on knowledge graphs: Particle Filtering is all you need!},
author={Gallo, Denis and Lissandrini, Matteo and Velegrakis, Yannis},
booktitle = {Proceedings of the 23rd International Conference on Extending Database Technology, {EDBT} 2020},
volume={2020},
pages={447--450},
doi = {10.5441/002/edbt.2020.54},
year={2020},
organization={OpenProceedings.org}
}
</code></pre>
EDAO: Example Driven Analytics of Open Knowledge Graphs2019-12-02T00:00:00Zhttps://people.cs.aau.dk/notes/edao-announce.html<!-- 10-2023 -->
<figure class="body-figure small">
<img src="https://people.cs.aau.dk/~matteo/images/MSCA-logo.png" alt="Marie Skłodowska-Curie Actions logo">
<figcaption>
The Marie Skłodowska-Curie actions support researchers at all stages of their careers, cooperation between industry and academia, and innovative training to enhance employability and career development.
</figcaption>
</figure>
<section class="intro secsubhead">
<p>
I'm both honoured and very excited to announce that I've been granted a Marie Skłodowska-Curie Individual Fellowship to work on <a href="https://edao.eu/"><em>Example Driven Analytics of Open Knowledge Graphs</em> (EDAO)</a>.
</p>
<p>
In this project I will work at the <a href="https://www.daisy.aau.dk/">Center for Data-intensive Systems (DAISY)</a> at Aalborg University, and will collaborate with <a href="http://people.cs.aau.dk/~tbp/">prof. Torben Bach Pedersen</a>.
</p>
</section>
<figure class="body-figure large">
<img src="https://people.cs.aau.dk/~matteo/images/edao-example.png" alt="From an Example like 'France 67M 2019' to detailed statistics of population growth in french speaking countries or european countries.">
<figcaption>
The EDAO project aims to study methods to help users to move from an example of interest to insights by tapping into the Linked Open Data.
</figcaption>
</figure>
<p>
The <strong>EDAO</strong> project has the aim to study a new <a href="http://data-exploration.ml/">Example-Driven Exploration</a> system to bridge the gap between example-based queries and BI methods for Linked Open Data.
</p>
<p>
I'm really thankful to the <a href="https://ec.europa.eu/research/mariecurieactions/">Marie Skłodowska-Curie Actions
Research Fellowship programme</a> for this opportunity and I'm looking forward to tackle many exicting and challenging problems.
</p>
<p>
The project just started, more information will be posted soon on <a href="https://edao.eu/">the official website</a>.
You can also follow the developments on twitter <a href="https://twitter.com/EDAO_eu">@EDAO_eu</a>.
</p>
SEA Data: Search, Exploration, and Analysis in Heterogeneous Datastores2019-10-29T00:00:00Zhttps://people.cs.aau.dk/notes/sea-data-announce.html<!-- 10-2023 -->
<figure class="body-figure small">
<img src="https://people.cs.aau.dk/~matteo/images/sea-site-cover.png" alt="Sea Data website">
<figcaption>
The 1st Workshop on Search, Exploration, and Analysis in Heterogeneous Datastores, <b>Co-located with EDBT/ICDT 2020 (30 March 2020, Copenhagen, Denmark)</b>
</figcaption>
</figure>
<section class="intro secsubhead">
<p>
I'm very excited to announce the <a href="https://sea-data.ml/">1st Workshop on Search, Exploration, and Analysis in Heterogeneous Datastores</a> organized at <a href="https://diku-dk.github.io/edbticdt2020/">EDBT/ICDT 2020</a> by my colleagues <a href="http://mott.in/">Davide Mottin</a> and <a href="https://velgias.github.io/">Yannis Velegrakis</a>, and me.
</p>
<p>
SEA Data workshop will provide a forum for researchers and practitioners to exchange ideas, results, and visions on <em>challenges in data management, information extraction, exploration, and analysis of heterogeneous data and multiple data models at once.</em>
</p>
</section>
<p>
Moreover, we are extremely thankful to the amazing researchers (both from industry and academia) that agreed in being part of our program committee.
</p>
<p>
<b>SEA Data</b> aims at gathering researchers and practitioners from various communities related to databases.
<em>We gladly accept submissions that present initial ideas and visions, just as much as reports on early results, or reflections on completed projects.</em>
The workshop will focus on discussion and interaction, rather than static presentations of what is in the paper.
You can see a list of relevant topics on the official website and on our <a href="https://sea-data.ml/cfp.txt">call for papers</a>
</p>
<p>
We are looking forward to a very intersting workshop, so please
<a class="call-to-action" href="https://sea-data.ml/">Submit your work!</a>
</p>
Keywords in paper titles from KDD'19 and ICDE'202019-10-15T00:00:00Zhttps://people.cs.aau.dk/notes/kdd19-icde20-keywords.html<!-- 10-2023 -->
<section class="intro secsubhead">
<p>
Frequency of <code>keywords</code> in paper titles from KDD'19.
</p>
</section>
<pre><code>
network 242
graph 170
edge 49
knowledge 39
discover 37
rank 32
? 14
subgraph 12
explor 10
'Can ' 7
</code></pre>
<section class="intro secsubhead">
<p>
Frequency of <code>keywords</code> in paper titles from ICDE'20 first round.
</p>
</section>
<pre><code>
network 49
graph 41
Deep 20
edge 18
knowledge 14
rank 11
neural network 7
discover 6
subgraph 6
quality 6
explor 5
exploration 3
pagerank 1
</code></pre>
<section class="intro secsubhead">
<p>
Frequency of <code>keywords</code> in paper titles from ICDE'20 second round.
</p>
</section>
<pre><code>
graph 70
network 54
edge 29
knowledge 24
Deep 18
discover 12
neural network 9
rank 8
subgraph 3
quality 2
exploration 3
exploration 2
pagerank 0
</code></pre>
2019 Conferences Round-Up: ESWC, SIGMOD, SIGIR, VLDB2019-09-14T00:00:00Zhttps://people.cs.aau.dk/notes/conferences-2019.html<!-- 10-2023 -->
<section class="intro secsubhead">
<p>
This year I was lucky enough to participate to some of the major conferences in the area of databases, data management, semantic web, and information retrieval,
namely (and in chronological order):
</p><p><a href="https://2019.eswc-conferences.org/">ESWC'19</a>,
<a href="https://sigmod2019.org/">SIGMOD'19</a>,
<a href="https://sigir.org/sigir2019/">SIGIR'19</a>, and
<a href="https://vldb.org/2019/">VLDB'19</a>.</p>
<p></p>
<p>
I've listened to great presentations and I've been exposed to exciting problems, research topics, and ideas.
</p>
<p>
There are brilliant trip-reports about them.
I warmly invite you to read those as well (some are linked below) and dig in the respective conference proceedings.
Here I provide my experience, pointers on some of the works that mostly resonated with me, and I will reference to the work that I was presenting at each venue.
</p>
</section>
<p>You can jump directly to my notes for
<a href="https://people.cs.aau.dk/notes/conferences-2019.html#eswc-2019">ESWC'19</a>,
<a href="https://people.cs.aau.dk/notes/conferences-2019.html#sigmod-2019">SIGMOD'19</a>,
<a href="https://people.cs.aau.dk/notes/conferences-2019.html#sigir-2019">SIGIR'19</a>, or
<a href="https://people.cs.aau.dk/notes/conferences-2019.html#vldb-2019">VLDB'19</a>.</p>
<h3 id="eswc-2019" tabindex="-1">ESWC 2019</h3>
<figure class="body-figure">
<img src="https://people.cs.aau.dk/~matteo/images/ESWC19-logo.png" alt="ESWC 2019 Logo, Portoroz">
<figcaption>
<a href="https://2019.eswc-conferences.org/">ESWC 2019</a>, <time datetime="2019-06-02">June 2</time>-<time datetime="2019-06-06">6</time>, in Portorož (Slovenia).
</figcaption>
</figure>
<p>It was for me the first time attending the
<a href="https://eswc-conferences.org/">Extended Semantic Web Conference</a>,
the only other time I participated to a Semantic Web conference was quite a long time ago (it was <a href="https://people.cs.aau.dk/~matteo/notes/iswc-2014-keynote.html">ISWC'14</a>).</p>
<p>I was there presenting the early results of a collaboration with the
<a href="https://bonsai.uno/">BONSAI organization</a>
in the effort to build
<a href="https://people.cs.aau.dk/~matteo/publications/demo/2019-eswc-bonsai.html">«An Open Source Dataset and Ontology for Product Footprinting»</a>.
A true interdisciplinary effort with the goal to <em>allow the science of lifecycle assessment to perform in a more transparent and more reproducible way.</em>
Incidentally, this work was awarded with the
<a href="https://2019.eswc-conferences.org/awards/">best poster</a>.</p>
<h4 id="interesting-tidbits%3A-the-power-of-knowledge-graphs..." tabindex="-1">Interesting tidbits: the power of Knowledge Graphs...</h4>
<p>It should not surprise anyone that the words <em>“knowledge”</em> and <em>“graphs”</em> appeared multiple times on the large screens of the Hotel Bernardin.</p>
<p>One of the presentations that put knowledge graphs under the spot-light was the
<a href="https://2019.eswc-conferences.org/keynote-peter-haase/">keynote by Peter Haase</a>.
It was an unconventional keynote in a sense, since, instead of a deck of slides, the talk was a knowledge graph itself, <em>explored live!</em>
It really showed how natural it is <em>to organized and explore knowledge as entities connected by facts</em> and how this allows to connect both multiple domains as well as heterogenous sources of information.</p>
<p>Once information is accessible in the form of a knowledge graph, a lot of interesting possibilities arise.
You can enhance
<a href="https://link.springer.com/chapter/10.1007/978-3-030-21348-0_15">dialogue systems</a>,
provide more interesting and
<a href="https://link.springer.com/chapter/10.1007/978-3-030-21348-0_38">personalized recommendations</a>,
better understand
<a href="https://link.springer.com/chapter/10.1007/978-3-030-21348-0_22">vague user queries</a>,
or better
<a href="https://duckduckgo.com/?q=%22Using+Knowledge+Graphs+to+Search+an+Enterprise+Data+Lake%22&t=h_&ia=web">search for information in an enterprise data lake</a>.</p>
<p>Another very interesting topic relates to the unique possibility, offered by RDF and SPARQL, to allow
<a href="https://link.springer.com/chapter/10.1007/978-3-030-21348-0_2">query answering over information that is not materialized in your knowledge graphs, but derived by its structure and annotations</a>.</p>
<p>Different techniques and tools have also been presented to
<a href="https://link.springer.com/chapter/10.1007/978-3-030-21348-0_4">retrieve textual evidence of facts contained in a knowledge graph</a>,
enrich a knolwedge graph through
<a href="https://link.springer.com/chapter/10.1007/978-3-030-21348-0_19">relation extraction</a>,
(and you can do that with the help of a human employing an
<a href="http://data-exploration.ml/">example based approach</a>),
build a
<a href="https://link.springer.com/chapter/10.1007/978-3-030-21348-0_1">fully decentralized p2p repository of semantic data</a>,
<a href="https://link.springer.com/chapter/10.1007/978-3-030-21348-0_20">embed reasoning capabilities</a> in mobile applications,
and
<a href="http://owlgred.lumii.lv/">visualize and edit ontologies</a>.</p>
<h4 id="...-and-more" tabindex="-1">... and more</h4>
<p>In general, I've found a community deeply involved in having an impact on real-world use-cases (very interesting the <em>In-Use track</em>), as well as in openly sharing results and resources (there is actually a <em>Resources Track</em>!).</p>
<p>If you want to know more, I would urge you to look in
<a href="https://link.springer.com/book/10.1007%2F978-3-030-21348-0">the proceedings</a>
and also read
<a href="https://thinklinks.wordpress.com/2019/06/28/trip-report-eswc-2019/">the trip-report by Paul Groth</a>.</p>
<h3 id="sigmod-2019" tabindex="-1">SIGMOD 2019</h3>
<figure class="body-figure">
<img src="https://people.cs.aau.dk/~matteo/images/SIGMOD19-logo.png" alt="SIGMOD 2019 Logo, Amsterdam">
<figcaption>
<a href="https://sigmod2019.org/">SIGMOD 2019</a>, <time datetime="2019-06-30">June 30</time> - <time datetime="2018-07-05">July 5</time>, in Amsterdam (Netherlands).
</figcaption>
</figure>
<p>Only a few weeks after ESWC, I visited Amsterdam for the
<a href="https://sigmod.org/">International Conference on Management of Data (SIGMOD)</a>.
I was there with my <del>partners in crime</del> colleagues
<a href="http://mott.in/">Davide Mottin</a>,
<a href="http://helios.mi.parisdescartes.fr/~themisp/home.html">Themis Palpanas</a>,
and
<a href="https://velgias.github.io/">Yannis Velegrakis</a>
to present an updated and much extended version of our
<a href="https://people.cs.aau.dk/~matteo/publications/tutorial/2019-sigmod-exemplar-tutorial.html">tutorial on exploratory search</a>,
the contents of which are based on our recent book on
<a href="http://data-exploration.ml/">«Data Exploration using Example-based Methods»</a>.</p>
<p>The venue, the organization, and the conference as a whole were outstanding!
Not to mention the
<a href="https://thinklinks.files.wordpress.com/2019/07/bkbnl49c-1.jpg">best conference badges</a> I've ever seen.</p>
<p>The conference is a leading venue when it comes to databases and data management, and had a massive audience of more than 1000 people.
I cannot hope my insight will do justice to the wide array of important topics covered.
I would invite you once more to look at
<a href="https://dlnext.acm.org/doi/10.1145/3299869">the conference proceedings</a>
and to
<a href="https://thinklinks.wordpress.com/2019/07/15/trip-report-sigmod-pods-2019/">the other trip report, again by Paul Groth</a>.
While below I will just highlight a few things.</p>
<h4 id="graphs%2C-graphs%2C-graphs..." tabindex="-1">Graphs, graphs, graphs...</h4>
<p>There were 3 sessions about graphs, and graphs appeared also in other contexts as well.
Some papers and demos I took note of are (in no particular order):</p>
<ul>
<li>PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs, by Wei et al.</li>
<li>Experimental Analysis of Streaming Algorithms for Graph Partitioning, by Pacaci et Özsu</li>
<li>Fractal: A General-Purpose Graph Pattern Mining System, by Dias et al.</li>
<li>Interactive Graph Search, by Tao et al.</li>
<li>Optimizing Declarative Graph Queries at Large Scale, by Zhang et al.</li>
<li>Efficient Subgraph Matching: Harmonizing Dynamic Programming, Adaptive Matching Order, and Failing Set Together, by Han et al.</li>
<li>CECI: Compact Embedding Cluster Index for Scalable Subgraph Matching, by Bhattarai et al.</li>
<li>Large Scale Graph Mining with G-Miner, by Chen et al.</li>
<li>NeMeSys - A Showcase of Data Oriented Near Memory Graph Processing, by Krause et al.</li>
<li>NAVIGATE: Explainable Visual Graph Exploration by Examples, by Namaki et al.</li>
</ul>
<p>As a side note, I particularly enjoyed the style of the presentation of «Interactive Graph Search», it actually embodied the idea that a paper presentation should provide the important information about the research presented leaving you with the honest desire to read the paper for more details.</p>
<p>In general, <em>graphs are really ubiquitous and they continue to be a prolific field of study</em> (read below also about
<a href="https://people.cs.aau.dk/~matteo//notes/conferences-2019.html#vldb-2019">the VLDB keynote by Tamer Özsu</a>.
In general, systems to tackle and solve some of the important challenges related to graph data management are presented every year in all the relevant venues.
I'm wondering though whether we - as a community - could do more to have these papers translate to solutions adopted in practice.
This is a well known issue in many areas, but now that graph data management is in the spotlight, I feel we have a unique opportunity to have real impact.</p>
<p>On the general topic of graphs in the real world, co-located with SIGMOD, took place the 12th Technical User Community (TUC) meeting of the LDBC council.
On the program you can see
<a href="http://ldbcouncil.org/blog/12th-tuc-meeting-sigmod-2019-amsterdam-july-5-2019-agenda-update">quite a packed schedule</a>
with members both from industry and academia discussing about graph database systems, graph benchmarking, and query languages.
It was great and inspiring!</p>
<p>Among the various presentations, my personal highlight was the talk by Vasileios Trigonakis (Oracle) on experiences and limitations in evaluating their distributed graph query engine with LDBC.
My personal takeaway is that
<a href="http://people.cs.aau.dk/~matteo/publications/journal/2018-vldb-gdb.html">we need to work on micro-benchmarks to complement existing benchmarks and for an in-depth understanding of graph databases systems performance</a>.
This topic will come back later in my notes about
<a href="https://people.cs.aau.dk/~matteo//notes/conferences-2019.html#vldb-2019">this year VLDB as well</a>.</p>
<h4 id="information-discovery-and-what-is-interesting" tabindex="-1">Information Discovery and what is Interesting</h4>
<p>At one of the industry sessions, there were two papers about automatic discovery of <em>intersting insights</em>.
The first, «Quick Insight: Interesting pattern discovery», by Justin Ding et al. at Microsoft, where they defined a collection of <em>interesting patterns</em>, they called insights.
For example a rising trend, or an outlier among a set of otherwise similar data-points.
Then they devised a systematic mining framework to discover such patterns efficiently and integrated that in one of their products.
Among the other things, they also devise methods to <em>skip</em> patterns that are trivial (e.g., a linear correlation between two values among which it exists a functional dependency).</p>
<p>The second paper, by Flip Korn et al. presented by Cong Yu (Google), was about «Automatically Generating Interesting Facts from Wikipedia Tables».
They use this method to augment entity knowledge panels, i.e., those small panels of information that are shown when we perform an entity centric search, with <em>fun facts</em>.
For instance, if the user performs a search for the movie <em>Frozen</em>, the knowledge panel, which provides facts like the release date and the director, could also show that <em>«Frozen is the
highest-grossing animated film of all time»</em>.
Such facts are extracted from <em>superlative tables</em> in Wikipedia, e.g., the table of the top 10 highest-grossing animated movies.</p>
<p>The general premise of both methods is that interestingness can be more easily defined as something unconventional or special (an outlier, dominance in some ranking, etc.).
This is understandable, it caters to a universal definition of interestingness, which makes it a safer bet when it comes to <em>fun facts</em>.</p>
<p>Yet, this does not tap into users' information need and intention except for what is <em>explicit</em> in the query.
When dealing with <em>data exploration</em>, identifying the user intention can help proposing <em>exploratory directions</em> to allow the user find more relevant information and better understand the information available.
For instance, Frozen is based on a story by Hans Christian Andersen like at least other 32 others,
this could be a <em>non trivial</em> dimension worth exploring (compared to suggesting movies with the same director) even if this doesn't represent any exceptionality.</p>
<h3 id="sigir-2019" tabindex="-1">SIGIR 2019</h3>
<figure class="body-figure">
<img src="https://people.cs.aau.dk/~matteo/images/SIGIR19-logo.png" alt="SIGIR 2019 Logo, Paris">
<figcaption>
<a href="https://sigir.org/sigir2019/">SIGIR 2019</a>, <time datetime="2019-07-21">July 21</time>-<time datetime="2019-07-25">25</time>, in Paris (France).
</figcaption>
</figure>
<p>We presented a special version of our tutorial, titled
<a href="http://data-exploration.ml/#sigir">«Example-driven Search: a New Frontier for Exploratory Search»</a>
at the
<a href="https://sigir.org/sigir2019/">42nd International ACM SIGIR Conference on Research and Development in Information Retrieval</a> in Paris.</p>
<p>Our goal was to do our part in fostering collaborations on the topic of <em>exploratory search</em> in the intersection between information retrieval, data management, and data mining.
We focused on exploratory analysis techniques based on examples that can be easily applied to improve or extend tools and systems for advanced information retrieval applications with both structured and non-structured data.</p>
<p>While in the database community the «Query by example» by Zloof (1975) is the seminal work,
in the IR community <em>querying by example documents</em> has its root in the «Relevance Feedback in Information Retrieval» work by Rocchio (1971) and the general idea of <em>query by content</em> that had extensive success for image retrieval later on for example.</p>
<p>As a side note, in parallel to our tutorial, another one was taking place about
<a href="http://ltr-tutorial-sigir19.isti.cnr.it/slides/">Learning to Rank in Theory and Practice: From Gradient Boosting to Neural Networks and Unbiased Learning</a>.
I didn't managed to attend it, but I heard it was great and I'm planning to study these slides soon.</p>
<p>I also invite you to read the trip reports by
<a href="https://ws-dl.blogspot.com/2019/07/2019-07-30-sigir-2019-in-paris-trip.html">Michele Weigle</a>
and the
<a href="https://docs.google.com/document/d/1cVC6cbvZkVw4SYJSwP-YqRIf5XHBwvU2_7ps3SkZ1x0/edit#heading=h.8ehsssll2htn">real-time notes by Jeff Dalton (on Gdocs)</a>.</p>
<p>Also, the welcome reception
<a href="https://people.cs.aau.dk/notes/inside%20the%20Museum%20d'Histoire%20Naturelle">https://www.mnhn.fr/en/visit/lieux/grande-galerie-evolution-gallery-evolution</a>
with the museum all for us and the effects from the storm simulator was really something else!</p>
<h4 id="interactive-search%2C-knowledge-graphs%2C-and-explainable-recommendations" tabindex="-1">Interactive Search, Knowledge Graphs, and Explainable recommendations</h4>
<p>The opening keynote of the conference was by
<a href="http://ciir.cs.umass.edu/croft">Bruce Croft</a>
and was on
<a href="https://sigir.org/sigir2019/program/keynotes/">the importance of interaction for information retrieval</a>.
My main takeaways from the keynote are:</p>
<ul>
<li>Interaction is necessary for effective information access.</li>
<li>Traditional Search puts the burden on the user (in specifying what they are looking for).</li>
<li>To overcome this limitation, we require a system that explicitly models interaction and user intent.</li>
<li>That will allow to obtain personalized browsing and guided assistance, especially for exploratory search</li>
</ul>
<p>Given that I was there to talk about exploratory search, much of the contents of the keynote resonated deeply with me.
Bruce also described iterative search as a process where the system can show some examples, ask what looks relevant and what does not.
Moreover, he highlighted <em>exploratory search as a dynamic ongoing process that has to be modeled as a whole</em>.
In particular, he argued, the system should model explicitly the history of the search,
and should be able to ask clarifying questions when not confident in the answer.
Referencing the seminal work by
<a href="https://dl.acm.org/citation.cfm?id=1121979">Marchionini (2006)</a>,
he stressed that, to support knowledge understanding, the focus should move beyond “<em>one-shot retrieval</em>” and move toward “<em>intent-aware response retrieval</em>”.</p>
<p>Knowledge Graphs also had significant representation at SIGIR.
Some of the works that caught my attention were:</p>
<ul>
<li>Network Embedding and Change Modeling in Dynamic Heterogeneous Networks, by Bial et al.</li>
<li>Embedding Edge-attributed Relational Hierarchies, by Chen and Quirk.</li>
<li>M-HIN: Complex Embeddings for Heterogeneous Information Networks via Metagraphs, by Fang et al.</li>
<li>A Scalable Virtual Document-Based Keyword Search System for RDF Datasets, by Dosso and Silvello.</li>
<li>Personal Knowledge Base Construction from Text-based Lifelogs, by Yen et al.</li>
<li>ENT Rank: Retrieving Entities for Topical Information Needs through Entity-Neighbor-Text Relations, by Dietz.</li>
</ul>
<p>I'm really fascinated by the idea of personal knowledge bases (and personal knowledge graphs),
and I think embedding methods for KGs are still in the early stages and missing quite a lot of the potential a KG has to offer.
<em>If you are interested in these topics, and have read so far, we should definitely talk!</em></p>
<p>On a completely different topic, I found the work by Balog and colleagues on
<a href="http://krisztianbalog.com/files/sigir2019-transrec.pdf">«Transparent, Scrutable and Explainable User Models for Personalized Recommendation»</a>
extremely compelling.
They demonstrate how a set-based recommendation technique, which is simple to understand, allows the user model (that is the reasoning behind the recommendation) to be explicitly presented to users in natural language.
For example, their approach produces recommendation explanations like “<em>You don't like movies that are tagged as <u>adventure</u>, unless they are tagged as <u>thriller</u>, such as Twister.</em>”.
This, in turn, allows explainable recommendations and enables the user to provide feedback to the system (to improve the model).
Moreover, all this without any significant loss in the quality of the recommendations!</p>
<p>I think this approach to build models that are <em>explainable by construction</em>, instead of trying to concoct explanation post-hoc on why some involved neural network hallucinated some recommendation, a highly promising direction.</p>
<h3 id="vldb-2019" tabindex="-1">VLDB 2019</h3>
<figure class="body-figure">
<img src="https://people.cs.aau.dk/~matteo/images/VLDB19-logo.png" alt="VLDB 2019 Logo, Los Angeles">
<figcaption>
<a href="https://vldb.org/2019/">VLDB 2019</a>, <time datetime="2019-08-26">August 26</time>-<time datetime="2018-08-30">30</time> in Los Angeles (U.S.A.).
</figcaption>
</figure>
<p>Last, but not least, I had also the opportunity to cross the ocean and join the other premier venue for database and data management,
<a href="https://vldb.org/2019/">the 45th International Conference on Very Large Data Bases (VLDB)</a>.</p>
<p>I was there with my colleague
<a href="https://martinbrugnara.it/">Martin Brugnara</a>
to present the work we did together with
<a href="https://velgias.github.io/">Yannis Velegrakis</a>
on
<a href="https://graphbenchmark.com/">benchmarking graph databases</a>.
In this work, <em>we present the first micro-benchmarking framework for the assessment of the functionalities of the existing graph databases systems</em>.
Moreover, we provide a comprehensive study of the existing systems in order to understand their capabilities and limitations.</p>
<p>We strongly believe this work to be particularly timely, as graph database systems are reaching maturity and widespread use, we can start discussing in details their architectures alongside the advantage or drawbacks of the various implementation alternatives.
The presentation at
<a href="https://people.cs.aau.dk/notes/conferences-2019.html#SIGMOD2019">the LDBC event at SIGMOD</a>
I discussed above and the keynote by Tamer Özsu at this VLDB strongly reinforced in me this belief.</p>
<h4 id="keynotes%3A-self-driving-databases%2C-graph-data-management-systems%2C-and-getting-rid-of-data" tabindex="-1">Keynotes: Self-driving Databases, Graph Data Management Systems, and getting rid of data</h4>
<p>Before moving to the opening keynote for the main conference, let me just have a pointer here to the keynote by Andy Pavlo at the
<a href="https://sites.google.com/view/aidb2019/home/workshop-program">“AI for Databases” workshop</a>/.
Andy presented his experiences on
<a href="https://www.youtube.com/watch?v=9Zy8aztUCxA">the challenges of self-driving databases</a>.
As usual, thanks to its
<a href="https://sigmodrecord.org/publications/sigmodRecord/1503/pdfs/09_profiles_Pavlo.pdf">unconventional style</a>, it was a quite enjoyable and informative talk.
Not surprisingly, moving towards a data management system that adapts in almost-real-time to changes in workload and data at scale requires to overcome quite a few obstacles.
I warmly invite you to watch the talk recording at the link above to let Andy explain the details better than what I could do.</p>
<p>The opening keynote at the main conference instead was on the open problems in graph processing (
<a href="https://vldb2019.github.io/files/VLDB19-keynote-1-slides.pdf">slides</a>
).
Prof. Özsu opened with the multiple disciplines and research areas in which the research on graphs is fragmented.
In particular, knowledge graphs and the semantic web, graph DBMS, and graph analytics systems.
His keynote, as well as his research expertise, covered a great deal of topics, including RDF Engines, Graph DBMSs, Graph Analytics Systems, and Dynamic & Streaming Graphs.
His recent work,
<a href="https://dl.acm.org/citation.cfm?id=3164139">presented in the past edition of VLDB</a>, was exactly about the fact that graphs are everywhere, and not just social networks: products, web, financial, infrastructure, and knowledge graphs as well.
This great deal of domains is matched with a corpus of methods and approaches of comparable size.
He also addressed the eternal debate between scale-out and scale-up for graph management solutions and presented his argument about how the gigantic size of graph datasets today, along with the rich amount of non trivial information they store, can only be met by scale-out strategies.
This argument, as well as the opposing view, have been expressed quite eloquently in a pair of issues in the article
<a href="https://ieeexplore.ieee.org/document/8379528">«Scale Up or Scale Out for Graph Processing?,»</a>
and the corresponding
<a href="https://ieeexplore.ieee.org/document/8481644">response article</a>
in the IEEE Internet Computing journal.</p>
<p>Speaking of RDF systems, Özsu's analysis suggests that a 1-to-1 mapping to the relational model (the single table approach for instance) is not ideal.
The open problems in this area comprise how to scale out, what is the best storage architecture, full implementation of the SPARQL query language, the computational cost of entailment in the Semantic Web, ensuring data quality and efficient RDF data cleaning, handling streaming RDF data, and the new challenges of having RDF data management embedded in IoT devices.
Moreover, he suggested that the DB community could get more involved with the Semantic Web community (and viceversa) for the study and development of performant RDF management systems (this reminds me of
<a href="https://ruben.verborgh.org/articles/the-semantic-web-identity-crisis/">Ruben Verborgh's opinion on the overlooked 20% of engineering effort in the Semantic Web</a>).
<em>I, for one, am highly interested in doing my part in this effort!</em></p>
<p>Speaking of graph data management systems (GDBMS), the main difference from the Semantic Web is the <em>property graph model</em>, this model where properties are directly associated with edge and nodes.
They are highly optimized for online workloads (OLTP), and the workloads for GDBMS are usually more skewed towards traversal queries (e.g., paths and reachability) as well as subgraph search as in the RDF triplestores.
The open issues he highlighted were on the current graph query processing techniques.
In particular, if I understood correctly, his view is that the strategy to process structure and data separately is likely to provide sub-optimal performance.
He also stressed how the poor locality in graph workloads renders traditional caching techniques less effective.
Moreover, as for triplestores, it is unclear what storage system works best.
In particular he highlighted how there is too much focus on homogeneous graphs (graphs with a single edge type and maybe without attributes).
Finally, he called for more work on benchmarking, and in particular micro-benchmarking, in order to understand how each component of a GDBMS works and performs under different circumstances.
It goes without saying that I was quite happy to point him to our work on the
<a href="https://people.cs.aau.dk/~matteo/publications/journal/2018-vldb-gdb.html">graph database micro-benchmark</a>.</p>
<p>I am quite less acquainted with the literature on graph analytics and graph streaming.
So this part of the talk was quite instructive.
Especially the limitation of applying map-reduce directly, and the differences between Bulk Synchronous Parallel (BSP) and the Gather-Apply-Scatter (GAS) paradigms.
He also pointed out to quite a gap in the exploration of the design space that goes beyond these two paradigms.
As for open issues he pointed to OLAP style processing on graphs, integration in data science workflows, and the current need to support ML workloads over graphs.
The topic of streaming graphs seems to have received more attention only recently, and he provided a compelling distinction between dynamic graphs and streaming graphs.
In the first case we want to keep the whole picture up-to-date, in the second instead we keep a window of changes (insertion/deletion) and reason only within that window.</p>
<p>The third keynote that I want to highlight was by Tova Milo on Getting Rid of Data
(
<a href="https://vldb2019.github.io/files/VLDB19-keynote-2-slides.pdf">slides</a>
).
The idea is simple to present: we are producing too much data!
Not only it is already infeasible to store all of it, but also, most of it is of no use, so better keep only the <em>most useful parts</em> (for some definition of useful).</p>
<p>The concept was not new to me, at
<a href="https://people.cs.aau.dk/~matteo/publications/demo/2014-sigmod-exq.html">SIGMOD'14</a>
Martin Kersten was also explaining his idea of a
<a href="https://people.cs.aau.dk/notes/big-data%20fungus">https://www.monetdbsolutions.com/sites/default/files/CIDR2015Kersten.pdf</a> that would consume the stale, unused data in the system.</p>
<p>The idea seems outrageous, but Prof. Milo introduced an extremely convincing picture.
She argued that <em>not all data is equally important</em>, and distinguished between</p>
<ul>
<li>critical data, which cannot be lost at any time,</li>
<li>important data that we need but we can live without (maybe can be recomputed at some cost), and</li>
<li>potentially important data, which may be important, but not right now.</li>
</ul>
<p>Given that, if we do not pick ourselves which data to forfeit, the circumstances will decide for us, she moved to the challenges we need to face.
In particular the balance between size and importance, the question on which data is easy to summarize and how to summarize such data, and finally how to automatic generate a data disposal policy.
Interesting was the idea of an automated exploration agent (deep reinforcement learning?) that would move around and try to discover data that can be important and data that can be disposed.</p>
<h4 id="other-interesting-tidbits%3A-data-lakes%2C-timely-dataflow%2C-and-data-exploration" tabindex="-1">Other interesting tidbits: Data lakes, Timely Dataflow, and Data Exploration</h4>
<p>The other topics that attracted my attention at this conference were:</p>
<ul>
<li>The
<a href="https://rjmillerlab.github.io/data-lake-tutorial-slides/">tutorial on data lakes</a>;
</li>
<li>A work
<a href="https://dl.acm.org/citation.cfm?id=3355532">by Lai et al</a>
that studied how to do distributed subgraph matching on
<a href="https://github.com/TimelyDataflow/timely-dataflow">Timely Dataflow</a>;
</li>
<li>Optimization for Active Learning-based Interactive Database Exploration,
<a href="https://dl.acm.org/citation.cfm?id=3300970">by Huang et al</a>;
</li>
<li>A vision paper on Exploring Change: A New Dimension of Data Analytics,
<a href="https://hpi.de/naumann/projects/data-profiling-and-analytics/change-exploration.html">by Bleifuß et al</a>;
</li>
<li>Example-Driven Query Intent Discovery: Abductive Reasoning using Semantic Similarity,
<a href="https://dl.acm.org/citation.cfm?id=3360342">by Fariha and Meliou</a>;
</li>
<li>The poster of the VLDBJ survey on
<a href="https://link.springer.com/article/10.1007%2Fs00778-018-0528-3">Summarizing Semantic Graphs</a>;
</li>
<li>As well as a very interesting demo on
<a href="https://hal.inria.fr/hal-02152844">A Modular Framework for Analytical Exploration of RDF Graphs</a>.
</li>
</ul>
<h3 id="closing-remarks" tabindex="-1">Closing remarks</h3>
<p>To summarize this quite intense conference season, I'm happy to report on the <em>ubiquity of Knowledge Graphs</em>.
They are adopted in multiple forms to <em>enhance</em> systems and algorithms in many tasks.
In my view KGs are the perfect tool to build such <em>intent aware data exploration systems</em> that can help us find our route in this immense sea of data.
At the same time, KGs, being graphs, require efficient graph data management systems.
I have the feeling that such systems will arise from the intersection of triplestores and property graphs.</p>
<p>Participating to such important events was a privilege to me, but I cannot avoid thinking to the number of planes I took and to the impacts they had.
Has highlighted by many sources among which
<a href="https://www.sigplan.org/Resources/Climate/">the SIGPLAN initiative on Climate Change</a>
«<em>Air travel is a significant source of greenhouse gas emissions, which in turn are a significant contributor to climate change</em>».</p>
<p>I've already read or heard somewhere the idea that some of the major conferences should join forces and maybe organize to be in the same place almost at the same time, or even take place only once every other year.
I'm not sure what would be the best solution, but for sure I'll start being more mindful of the issue.</p>
ML-for-DB: what to cover2019-08-29T00:00:00Zhttps://people.cs.aau.dk/notes/ml-for-db-checklist.html<!-- 10-2023 -->
<blockquote class="important">
<p>
Here's my non-exhaustive checklist I think every good ML-for-DB paper should have, especially query optimization:
</p><ul>
<li>Tails: not just an average. Show the distribution or at least 90/95/99%</li>
<li>Query performance: show me that the queries actually get faster</li>
<li>Overhead: how much does training and inference cost?</li>
<li>Optimals: if possible, how close are you to the optimal prediction / latency?</li>
<li>Comparisons: existing (+ commercial, when possible), other learned, naive approaches</li>
<li>Failures: Show me systematic failures. If you think there aren't any, look harder</li>
<li>Pareto analysis: show me tradeoffs. No way you're better at everything.</li>
<li>Interpretation: what did the model learn? Examine the weights. Examine especially successful cases. It's ok to hypothesize!</li>
</ul>
<p>None of the ML-for-systems papers at this VLDB (including my own!) check all these boxes. Let's raise the bar!</p>
<footer>
— <cite><a href="https://twitter.com/RyanMarcus/status/1167178742042488833">Tweets by Ryan Marcus (@RyanMarcus)</a></cite>
</footer>
</blockquote>
Example-based Search: a New Frontier for Exploratory Search2019-07-01T00:00:00Zhttps://people.cs.aau.dk/publications/tutorial/2019-sigir-exemplar-tutorial.html<h1 id="example-based-search%3A-a-new-frontier-for-exploratory-search" tabindex="-1">Example-based Search: a New Frontier for Exploratory Search</h1>
<h2 id="matteo-lissandrini%2C-davide-mottin%2C-themis-palpanas%2C-yannis-velegrakis" tabindex="-1">Matteo Lissandrini, Davide Mottin, Themis Palpanas, Yannis Velegrakis</h2>
<div class="box-special">
<p>
The content of this tutorial has been expanded in a book: <br><a href="https://people.cs.aau.dk/~matteo/publications/book/2018-mc-exemplar.html" class="call-to-action">Find out more!</a>
</p>
<p>You can also visit the official website: <a href="https://data-exploration.ml/">data-exploration.ml</a></p>
</div>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
<p>
Exploration is one of the primordial ways to accrue knowledge about the world and its nature.
As we accumulate, mostly automatically, data at unprecedented volumes and speed, our datasets have become complex and hard to understand.
In this context <em>exploratory search</em> provides a handy tool for progressively gather the necessary knowledge by starting from a tentative query that hopefully leads to answers at least partially relevant and that can provide cues about the next queries to issue.
An exploratory query should be simple enough to avoid complicate declarative languages (such as SQL) and mechanisms, and at the same time retain the flexibility and expressiveness required to express complex information needs.
Recently, we have witnessed a rediscovery of the so called <em>example-based methods</em>, in which the user, or the analyst circumvent query languages by using examples as input.
This shift in semantics has led to a number of methods receiving as query a set of example members of the answer set.
The search system then infers the entire answer set based on the given examples and any additional information provided by the underlying database.
In this tutorial, we present an excursus over the main Example-based methods for Exploratory Search.
We show how different data types require different techniques, and present algorithms that are specifically designed for relational, textual, and graph data.
We conclude by providing a unifying view of this query-paradigm and identify new exciting research directions.
</p>
<p>
<b>The tutorial has been presented on Sunday, July 21st, at SIGIR'19</b>
<a href="https://sigir.org/sigir2019/program/tutorials/">See the official program.</a>
</p>
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Mottin, Davide</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Palpanas, Themis</span>.
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Velegrakis, Yannis</span>;
</span>
<br>
“<span itemprop="headline name">Example-basde Search: a New Frontier for Exploratory Search</span>.”
<br>
<div class="hidden">
<time datetime="2019-07-21" itemprop="datePublished">July, 2019</time>
<span itemprop="image"><%- @site.url%>/images/xq-logo.png</span>
</div>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019</span></em>
</span> (pp. <span itemprop="pageStart">1411</span>-<span itemprop="pageEnd">1412</span>).
</blockquote>
<pre><code class="lang-bibtex">
@inproceedings{Lissandrini:2019:ESN:3331184.3331387,
author = {Lissandrini, Matteo and Mottin, Davide and Palpanas, Themis and Velegrakis, Yannis},
title = {Example-based Search: A New Frontier for Exploratory Search},
booktitle = {Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval},
series = {SIGIR'19},
year = {2019},
isbn = {978-1-4503-6172-9},
location = {Paris, France},
pages = {1411--1412},
numpages = {2},
doi = {10.1145/3331184.3331387},
acmid = {3331387},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {data exploration, exploratory search, query paradigms},
}
</code></pre>
<!-- <%= @moment().format('YYYY') %> -->
Exploring the Data Wilderness through Examples2019-06-30T00:00:00Zhttps://people.cs.aau.dk/publications/tutorial/2019-sigmod-exemplar-tutorial.html<h1 id="exploring-the-data-wilderness-through-examples" tabindex="-1">Exploring the Data Wilderness through Examples</h1>
<h2 id="davide-mottin%2C-matteo-lissandrini%2C-themis-palpanas%2C-yannis-velegrakis" tabindex="-1">Davide Mottin, Matteo Lissandrini, Themis Palpanas, Yannis Velegrakis</h2>
<div class="box-special">
<p>
The content of this tutorial has been expanded in a book: <br><a href="https://people.cs.aau.dk/~matteo/publications/book/2018-mc-exemplar.html" class="call-to-action">Find out more!</a>
</p>
<p>You can also visit the official website: <a href="https://data-exploration.ml/">data-exploration.ml</a></p>
</div>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
<p>
Exploration is one of the primordial ways to accrue knowledge about the world and its nature.
As we accumulate, mostly automatically, data at unprecedented volumes and speed, our datasets have become complex and hard to understand.
In this context <em>exploratory search</em> provides a handy tool for progressively gather the necessary knowledge by starting from a tentative query that hopefully leads to answers at least partially relevant and that can provide cues about the next queries to issue.
An exploratory query should be simple enough to avoid complicate declarative languages (such as SQL) and mechanisms, and at the same time retain the flexibility and expressiveness of such languages.
Recently, we have witnessed a rediscovery of the so called <em>example-based methods</em>, in which the user, or the analyst circumvent query languages by using examples as input.
This shift in semantics has led to a number of methods receiving as query a set of example members of the answer set.
The search system then infers the entire answer set based on the given examples and any additional information provided by the underlying database.
In this tutorial, we present an excursus over the main example-based methods for exploratory analysis.
We show how different data types require different techniques, and present algorithms that are specifically designed for relational, textual, and graph data.
We conclude by providing a unifying view of this query-paradigm and identify new exciting research directions.
</p>
<p>
<b>The tutorial has been presented on Sunday, June 30th, at SIGMOD'19</b>
<a href="https://sigmod2019.org/sigmod_tutorials/">See the official program.</a>
</p>
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Mottin, Davide</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Palpanas, Themis</span>.
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Velegrakis, Yannis</span>;
</span>
<br>
“<span itemprop="headline name">Exploring the Data Wilderness through Examples</span>.”
<br>
<div class="hidden">
<time datetime="2017-08-27" itemprop="datePublished">July, 2019</time>
<span itemprop="image"><%- @site.url%>/images/xq-logo.png</span>
</div>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019</span></em>
</span> (pp. <span itemprop="pageStart">1411</span>-<span itemprop="pageEnd">1412</span>).
</blockquote>
<pre><code class="lang-bibtex">
@inproceedings{Mottin:2019:EDW:3299869.3314031,
author = {Mottin, Davide and Lissandrini, Matteo and Velegrakis, Yannis and Palpanas, Themis},
title = {Exploring the Data Wilderness Through Examples},
booktitle = {Proceedings of the 2019 International Conference on Management of Data},
series = {SIGMOD '19},
year = {2019},
isbn = {978-1-4503-5643-5},
location = {Amsterdam, Netherlands},
pages = {2031--2035},
numpages = {5},
url = {http://doi.acm.org/10.1145/3299869.3314031},
doi = {10.1145/3299869.3314031},
acmid = {3314031},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {data exploration, database usability, example-based methods, exploratory search, graph exploration},
}
</code></pre>
<!-- <%= @moment().format('YYYY') %> -->
An Open Source Dataset and Ontology for Product Footprinting2019-06-04T00:00:00Zhttps://people.cs.aau.dk/publications/demo/2019-eswc-bonsai.html<h1 id="an-open-source-dataset-and-ontology-for-product-footprinting" tabindex="-1">An Open Source Dataset and Ontology for Product Footprinting</h1>
<h2 id="agneta-ghose%2C-katja-hose%2C-matteo-lissandrini%2C-bo-pedersen-weidema" tabindex="-1">Agneta Ghose, Katja Hose, Matteo Lissandrini, Bo Pedersen Weidema</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/ESWC19-bonsai.pdf">PDF</a>)
or <a href="https://people.cs.aau.dk/~matteo/pdf/ESWC19-bonsai-poster.pdf">Poster (PDF)</a>
</p><p>This work has been awarded the <a href="https://2019.eswc-conferences.org/awards/">Best Poster award</a>.</p>
</section>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<figure class="body-figure small right">
<img src="https://people.cs.aau.dk/~matteo//images/bonsai-logo.png" alt="BONSAI logo">
<figcaption>
<a href="https://bonsai.uno/">BONSAI</a>: An organisation dedicated to create and maintain a truly global Open Source Database for product footprinting.
</figcaption>
</figure>
<section class="intro secsubead">
Product footprint describes the environmental impacts of a product system.
To identify such impact, Life Cycle Assessment (LCA) takes into account the entire lifespan and production chain, from material extraction to final disposal or recycling.
This requires gathering data from a variety of heterogeneous sources, but current access to those is limited and often expensive.
The BONSAI project, instead, aims to build a shared resource where the community can contribute to data generation, validation, and management decisions.
In particular, its first goal is to produce an open dataset and an open source toolchain capable of supporting LCA calculations.
This will allow the science of lifecycle assessment to perform in a more transparent and more reproducible way, and will foster data integration and sharing.
Linked Open Data and semantic technologies are a natural choice for achieving this goal.
In this work, <a href="https://github.com/BONSAMURAIS/">we present the first results of this effort</a>: (1) the core of a comprehensive ontology for industrial ecology and associated relevant data; and (2) the first steps towards an RDF dataset and associated tools to incorporate several large LCA data sources.
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Ghose, Agneta</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Hose, Katja</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Pedersen, Bo Weidema</span>.
</span>
<br>
“<span itemprop="headline name">An Open Source Dataset and Ontology for Product Footprinting</span>.”
<div class="hidden">
<time datetime="2019-06-01" itemprop="datePublished">June, 2019</time>
</div>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Extended Semantic Web Conference, 2019. Satellite Events</span></em>
</span> (pp. <span itemprop="pageStart">75</span>-<span itemprop="pageEnd">79</span>).
</blockquote>
<pre><code class="lang-bibtex">
@InProceedings{GhoseHose2019,
author="Ghose, Agneta
and Hose, Katja
and Lissandrini, Matteo
and Pedersen, Bo Weidema",
editor="Hitzler
and Kirrane
and Hartig
and de Boer
and Vidal
and Maleshkova,
and Schloba",
title="An Open Source Dataset and Ontology for Product Footprinting",
doi=""
booktitle="The Semantic Web: ESWC 2019 Satellite Events",
year="2019",
publisher="Springer International Publishing",
pages="75--79"
}
</code></pre>
Understanding Effect Sizes2019-04-01T00:00:00Zhttps://people.cs.aau.dk/notes/effect-size.html<!-- 10-2023 -->
<blockquote class="important">
<p>A picture which should be shown to every student in social science.<br>
When an article says something like <br>
"people in this category have a *significantly* higher level of X" with an effect size of 0.1 standard deviation, the effect is half as small as the one in the top picture.
</p>
<footer>
—
<cite>
<a href="https://twitter.com/page_eco/status/1090924043807543297">Tweet by Lionel Page (@page_eco)</a>
</cite>
</footer>
</blockquote>
<figure class="body-figure large">
<img src="https://people.cs.aau.dk/~matteo/images/tweet-effect-size.jpg" alt="Picture from @page_eco tweet comparing curves for standard deviation with different overlap">
<figcaption>
<a href="https://pbs.twimg.com/media/DyO7lhmUcAE90IB.jpg:large">@page_eco</a> image tweet: Understanding Effect Sizes
</figcaption>
</figure>
<section class="intro secsubhead">
<p>
It is interesting to think about all those experiments where we compare running times, or precision, or space efficiency between different methods among different queries.
</p>
</section>
<p class="after-heading">
For instance, I've often compared average or median running time across 100 random queries with *method A* vs. *method B*., but what would mean comparing effect size in this way?
</p>
Mining Patterns in Graphs with Multiple Weights2019-02-18T00:00:00Zhttps://people.cs.aau.dk/publications/journal/2019-dapdb-multi-weight-patterns.html<h1 id="mining-patterns-in-graphs-with-multiple-weights-(invited-article)" tabindex="-1">Mining Patterns in Graphs with Multiple Weights <small>(Invited Article)</small></h1>
<h2 id="giulia-preti%2C-matteo-lissandrini%2C-davide-mottin%2C-yannis-velegrakis" tabindex="-1">Giulia Preti, Matteo Lissandrini, Davide Mottin, Yannis Velegrakis</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/DAPDB-2019-resum.pdf">PDF</a>)
or <a href="https://rdcu.be/bntMy">Read online</a>
</p><p>
The final authenticated version is available online at <a href="https://doi.org/10.1007/s10619-019-07259-w">10.1007/s10619-019-07259-w</a>.
</p><p>
This is an extended version of <a href="https://people.cs.aau.dk/~matteo/conference/2017-edbt-relevant-patterns.html">our EDBT'18 work on Multi-Weighted Pattern Mining</a>
</p>
</section>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
<p>Graph pattern mining aims at identifying structures that appear frequently in large graphs, under the assumption that frequency signifies importance. In real life, there are many graphs with weights on nodes and/or edges. For these graphs, it is fair that the importance (score) of a pattern is determined not only by the number of its appearances, but also by the weights on the nodes/edges of those appearances. Scoring functions based on the weights do not generally satisfy the apriori property, which guarantees that the number of appearances of a pattern cannot be larger than the frequency of any of its sub-patterns, and hence allows faster pruning.
Therefore, existing approaches employ other, less efficient, pruning strategies. The problem becomes even more challenging in the case of multiple weighting functions that assign different weights to the same nodes/edges.</p>
<p>In this work we propose a new family of scoring functions that respects the apriori property, and thus can rely on effective pruning strategies. We provide efficient and effective techniques for mining patterns in multi-weighted graphs, and we devise both an exact and an approximate solution. In addition, we propose a distributed version of our approach, which distributes the appearances of the patterns to examine among multiple workers. Extensive experiments on both real and synthetic datasets prove that the presence of edge weights and the choice of scoring function affect the patterns mined, and the quality of the results returned to the user. Moreover, we show that, even when the performance of the exact algorithm degrades because of an increasing number of weighting functions, the approximate algorithm performs well and with fairly good quality.
Finally, the distributed algorithm proves to be the best choice for mining large and rich input graphs.</p>
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Preti, Giulia</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Mottin, Davide</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Velegrakis, Yannis</span>.
</span>
<br>
“<span itemprop="headline name">Mining Patterns in Graphs with Multiple Weights</span>.”
<br>
<div class="hidden">
<time datetime="2019-02-18" itemprop="datePublished">February, 2019</time>
<span itemprop="image">https://people.cs.aau.dk/~matteo/images/graph-logo.png</span>
</div>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Distributed and Parallel Databases</span></em>
</span> (<span itemprop="pageStart">1</span>-<span itemprop="pageEnd">39</span>).
</blockquote>
<pre><code class="lang-bibtex">
@Article{Preti2019,
author="Preti, Giulia
and Lissandrini, Matteo
and Mottin, Davide
and Velegrakis, Yannis",
title="Mining patterns in graphs with multiple weights",
journal="Distributed and Parallel Databases",
year="2019",
month="Feb",
day="18",
issn="1573-7578",
doi="10.1007/s10619-019-07259-w",
url="https://doi.org/10.1007/s10619-019-07259-w"
}
</code></pre>
Setup of a Docker container running Virtuoso2019-01-22T00:00:00Zhttps://people.cs.aau.dk/notes/virtuoso-setup-on-docker.html<!-- 10-2023 -->
<figure class="body-figure small show-logo">
<img src="https://people.cs.aau.dk/~matteo/images/openlink-logo.png" alt="OpenLink Software logo">
<figcaption>
<a href="https://openlinksw.com/" target="_blank">OpenLink Virtuoso</a> is a Universal Server for RDF and more.
</figcaption>
</figure>
<section class="intro secsubhead">
<p>
Lately I've been using <a href="https://virtuoso.openlinksw.com/">Virtuoso</a> for running some <code>SPARQL</code>.
Here is my <em>quick setup</em>.
</p>
</section>
<p>I also provide a custom configuration file (for machines with larger memory), the setup for working with a RAM disk (for fast read-only data), a Github gist, and instruction for loading data.</p>
<div class="box-special">
<h3>Update <time datetime="2020-02-22">2020-02-22</time></h3>
<p>
The new docker image is at <code>openlink/virtuoso-opensource-7</code>.
The code snippets have been updated accordingly.
</p>
</div>
<h3 id="setup-docker-container-for-virtuoso" tabindex="-1">Setup Docker container for Virtuoso</h3>
<figure class="body-figure small show-logo">
<img src="https://people.cs.aau.dk/~matteo/images/docker_logo.png" alt="Docker Logo: the whale">
<figcaption>
<a href="https://www.docker.com/" target="_blank">Docker</a> allows operating-system-level virtualization, also known as <a href="https://en.wikipedia.org/wiki/Docker_(software)" target="_blank">"containerization"</a>.
</figcaption>
</figure>
<p>The people at Openlink provided <a href="https://hub.docker.com/r/openlink/virtuoso-opensource-7">a docker image for the opensource version of their software</a>.
So we will pull that, prepare a folder for our data (so that if we kill the container we do not lose the database) and a folder for the data <em>to be imported</em>.
I also provide a customized <code>virtuoso.ini</code> file.</p>
<pre><code class="language-bash">docker pull openlink/virtuoso-opensource-7:latest
mkdir -p database
cp virtuoso.ini.example database/virtuoso.ini
mkdir -p import
</code></pre>
<p>We run the container setting the <code>database</code> and <code>import</code> folders as volumes, here the container is named <code>vos</code>. Note that I do not use <code>--rm</code> so that I can restart the container if I want, you can add <code>--rm</code> and then the container will be removed automatically when it dies.</p>
<pre><code class="language-bash">docker run --name vos -d \
--volume `pwd`/database:/database \
-v `pwd`/import:/import \
-t -p 1111:1111 -p 8890:8890 -i openlink/virtuoso-opensource-7:latest
</code></pre>
<p>The commands above require a custom <code>virtuoso.ini</code> file (provided here).
The main edits are based on my need to query a large dataset and I needed to process large resultsets.
More information on the parameters are found <a href="http://docs.openlinksw.com/virtuoso/dbadm/#virtini">on the official documentation</a>.</p>
<p>My edits below are for a machine with <code>~64GB</code> of RAM, and may not be optimal in general, so YMMV.</p>
<ol>
<li>Allow the <code>/import</code> folder where to put our files to be imported</li>
</ol>
<pre><code>DirsAllowed = ., /opt/virtuoso-opensource/vad, /import
</code></pre>
<ol start="2">
<li>Change memory size thresholds: uncomment the following lines, and comment below the corresponding two (comment with <code>;</code>)</li>
</ol>
<pre><code>NumberOfBuffers = 4000000
MaxDirtyBuffers = 3000000
;NumberOfBuffers = 10000
;MaxDirtyBuffers = 6000
</code></pre>
<p>few lines earlier you may want to change also</p>
<pre><code>MaxQueryMem = 4G ; memory allocated to query processor
VectorSize = 2000 ; initial parallel query vector (array of query operations) size
MaxVectorSize = 20000000 ; query vector size threshold.
</code></pre>
<ol start="3">
<li>
<p>Longer keep alive for large queries</p>
<pre><code>KeepAliveTimeout = 30
</code></pre>
</li>
<li>
<p>Allow for larger resultsets</p>
</li>
</ol>
<pre><code>ResultSetMaxRows = 50000
MaxQueryCostEstimationTime = 0 ; in seconds
MaxQueryExecutionTime = 600 ; in seconds
</code></pre>
<h4 id="a-gist" tabindex="-1">A Gist</h4>
<p><strong>The contents of this reamde and of the ini file can be found on <a href="https://gist.github.com/kuzeko/5d53f9800a4b6d45006f0f9dc322ed07" target="_blank">this Github gist</a>.</strong></p>
<p>
<a href="https://gist.github.com/kuzeko/5d53f9800a4b6d45006f0f9dc322ed07#partial-new-comment-form-actions" class="call-to-action" target="_blank">Add a comment</a> there if you have any feedback.
</p>
<h3 id="to-use-a-ram-disk-(in-the-example-of-size-8gb)" tabindex="-1">To use a RAM Disk (in the example of size 8GB)</h3>
<p><strong>This is in READ ONLY</strong> to have faster query performance. All edits will be lost.</p>
<pre><code class="language-bash">sudo mkdir -p /media/ramdisk1
sudo mount -t tmpfs -o size=8192M tmpfs /media/ramdisk1
docker run --name vos-ram7 -d \
--volume /media/ramdisk1/database:/database \
-v `pwd`/import:/import \
-t -p 1111:1111 -p 8890:8890 -i openlink/virtuoso-opensource-7:latest
</code></pre>
<h3 id="run-the-cli" tabindex="-1">Run the CLI</h3>
<pre><code class="language-bash">docker exec -it vos isql 1111
</code></pre>
<h4 id="create-graphs" tabindex="-1">Create graphs</h4>
<pre><code class="language-sql">SPARQL create GRAPH <http://www.purl.com/test/my_graph>;
</code></pre>
<h4 id="import-data" tabindex="-1">Import data</h4>
<pre><code class="language-sql">delete from DB.DBA.load_list;
ld_dir ('/import', 'my_file.ttl', 'http://www.purl.com/test/my_graph');
rdf_loader_run ();
checkpoint;
</code></pre>
<h4 id="use-the-cli-to-run-a-script" tabindex="-1">Use the CLI to run a script</h4>
<p>Assuming you have a script called <code>import.isql</code> in the local <code>import</code> folder, e.g., cotaning the <code>ld_dir</code> and <code>rdf_loader</code> commands above, you can run the following to execute that script.</p>
<pre><code class="language-bash">docker exec -it vos isql 1111 exec="LOAD /import/import.isql"
</code></pre>
<div class="box-special notice">
<h3>Warning <time datetime="2020-02-22">2020-02-24</time></h3>
<p>
The loader in new docker image at <code>openlink/virtuoso-opensource-7</code>
has disable checkpoints and <strong>your risk to not be able to access your data after you stop and restart the image</strong> (see this <a href="https://github.com/openlink/virtuoso-opensource/issues/900">GitHub issue for details</a>).
So remember to use <code>checkpoint;</code> command after the loading as explained in <a href="http://vos.openlinksw.com/owiki/wiki/VOS/VirtBulkRDFLoader#Bulk%20loading%20process">point 6 of the official guide</a>.
</p>
</div>
<h4 id="check-existing-graphs" tabindex="-1">Check existing graphs</h4>
<pre><code class="language-sql">SPARQL
SELECT DISTINCT ?g
WHERE
{
GRAPH ?g {?s ?p ?t}
}
</code></pre>
New Book: Data Exploration using Example-based Methods2018-12-06T00:00:00Zhttps://people.cs.aau.dk/notes/book-example-exploration.html<!-- 10-2023 -->
<figure class="body-figure small">
<img src="https://people.cs.aau.dk/~matteo/images/books-bokeh.jpg" alt="Printed copies of the book Data Exploration using Example-based Methods">
<figcaption>
The book (~140 pages) provides detailed examples and texts highlighting challenges in the area of example-based exploratory search.
</figcaption>
</figure>
<section class="intro secsubhead">
<p>
I’m very proud to have worked with my co-authors on a new book in the <em>Synthesis Lectures on Data Management</em> titled <a href="https://bit.ly/exemplar-book">“Data Exploration using Example-based Methods”</a>.
</p>
<p>
The book provides insights on how example-based search systems can be employed by expert and non-expert users in the realm of Data Exploration. In particular, in retrieving the portion of the data that is relevant to their interest, while avoiding the use of complex query languages.
</p>
</section>
<p>
We have witnessed a rediscovery of the example-based methods, which exploit inherent characteristics of the data to infer the results that the user has in mind but may not be able to (easily) express.
<b>The book presents an excursus over the main methods for exploratory analysis</b>, with a particular focus on example-based methods.
The book also presents the challenges and new frontiers of machine learning in online settings that have recently attracted the attention of the database community.
</p>
<figure class="body-figure">
<img src="https://people.cs.aau.dk/~matteo/images/exemplar-book-cover.png" alt="Cover of book: Data Exploration using Example-based Methods">
<figcaption>
Synthesis Lectures on Data Management — <a href="https://doi.org/10.2200/S00881ED1V01Y201810DTM053">Morgan & Claypool publishers</a>
</figcaption>
</figure>
<p>
With this book we have surveyed more than two hundreds research sources to highlight the main example-based techniques for <b>relational, graph, and textual data</b>.
</p>
<p>
We hope this book answers the questions and builds the necessary knowledge to those interested in constructing <b>new data exploration systems</b>.
</p>
<blockquote itemscope="" itemtype="http://schema.org/Book">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Mottin, Davide</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Palpanas, Themis</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Velegrakis, Yannis</span>.
</span>
<br>
“<span itemprop="headline name">Data Exploration Using Example-Based Methods</span>.”
<br>
<div class="hidden">
<time datetime="2018-12-01" itemprop="datePublished">December, 2018</time>
<span itemprop="image">https://people.cs.aau.dk/~matteo/images/exemplar-book-cover.png</span>
</div>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/CreativeWorkSeries">
<em><span itemprop="name">Synthesis Lectures on Data Management</span></em>
</span>, <span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/PublicationVolume"> <span itemprop="volumeNumber">10</span></span> <span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/PublicationIssue">
(<span itemprop="issueNumber">4</span>)
(<time datetime="2018-12-01" itemprop="datePublished">December, 2018</time>):
</span>
pages: <span itemprop="numberOfPages">164</span>.
<span itemprop="publisher" itemscope="" itemtype="http://schema.org/Organization">
<span itemprop="legalName">Morgan & Claypool Publishers</span>
</span><br>
ISBN: <span itemprop="isbn">9781681734552</span>.
</blockquote>
<p>
You can <a href="https://people.cs.aau.dk/~matteo/publications/book/2018-mc-exemplar.html">find out more</a> or <a class="call-to-action" href="https://bit.ly/exemplar-book">Buy the book</a>
</p>
Data Exploration using Example-based Methods2018-12-01T00:00:00Zhttps://people.cs.aau.dk/publications/book/2018-mc-exemplar.html<h1 id="data-exploration-using-example-based-methods" tabindex="-1">Data Exploration using Example-based Methods</h1>
<h2 id="matteo-lissandrini%2C-davide-mottin%2C-themis-palpanas%2C-yannis-velegrakis" tabindex="-1">Matteo Lissandrini, Davide Mottin, Themis Palpanas, Yannis Velegrakis</h2>
<section class="links">
<a class="attachment" href="https://doi.org/10.2200/S00881ED1V01Y201810DTM053" title="Read the Book">Read the book</a>
</section>
<h3 id="preface%3A" tabindex="-1">Preface:</h3>
<section class="intro secsubead">
<figure class="body-figure">
<img src="https://people.cs.aau.dk/~matteo/images/exemplar-book-cover.png" alt="Cover of book: Data Exploration using Example-based Methods">
<figcaption>
Synthesis Lectures on Data Management — <a href="https://doi.org/10.2200/S00881ED1V01Y201810DTM053">Morgan & Claypool publishers</a>
</figcaption>
</figure>
<p>
<strong>Exploration is one of the primordial ways to accrue knowledge about the world and its nature.</strong>
It describes the act of becoming familiar with something by testing or experimenting, and at the same time it evokes the image of a traveler traversing a new territory.
As we accumulate, mostly automatically, data at unprecedented volumes and speed, our datasets have become less and less familiar to us.
In this context we speak of **exploratory search** as of the process of gradual discovery and understanding of the portion of the data that is pertinent to an often-times vague user's information need.
Contrary to traditional search, where the desired result is well defined and the focus is on precision and performance, exploratory search usually starts from a *tentative query* that hopefully leads to answers at least partially relevant and that can provide cues about the next query.
By understanding the distinction between a traditional query and an exploratory query, we can change the semantics of the user input: instead of a strict prescription of the contents of the result-set, we provide a hint of what is relevant.
This shift in semantics has led to a number of methods having in common the very specific paradigm of *search by-example*.
Search by-example receives as query a set of example members of the answer set.
The search system then infers the entire answer set based on the given examples and any additional information provided by the underlying database.
</p>
</section>
<p>
With this book we have surveyed more than two hundreds research sources to highlight the main example-based techniques for relational, graph, and textual data.
The book provides insights on how these example-based search systems can be employed by expert and non-expert users in retrieving the portion of the data that is relevant to their interest, while avoiding the use of complex query languages.
We hope this book answers the questions and builds the necessary knowledge to those interested in constructing new data exploration systems.
</p>
<figure class="body-figure">
<img src="https://people.cs.aau.dk/~matteo/images/ch01-dataexploration.png" alt="Data Exploration using Example-based Methods: Data-models">
<figcaption>
This book covers example-based techniques for different data models.
</figcaption>
</figure>
<p>
Graduate students would hopefully deepen their interest in the subject and being involved in the new challenges and opportunities allowed by the powerful exploration method of search-by-example.
Researchers and practitioners working in the area will probably find new insights for further improving their approaches and systems.
</p>
<h3 id="more-info" tabindex="-1">More info</h3>
<p><a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/Exemplar_Book_Flyer.pdf" title="Read the flyer">Read more (PDF)</a>, <a class="attachment" href="http://www.morganclaypoolpublishers.com/content/9781681734569_sample.pdf" title="Read the flyer">Check the sample chapter (PDF)</a>, or</p>
<p><a class="call-to-action" href="https://bit.ly/exemplar-book" title="buy the book">Buy the book</a></p>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/Book">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Mottin, Davide</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Palpanas, Themis</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Velegrakis, Yannis</span>.
</span>
<br>
“<span itemprop="headline name">Data Exploration Using Example-Based Methods</span>.”
<br>
<div class="hidden">
<time datetime="2018-12-01" itemprop="datePublished">December, 2018</time>
<span itemprop="image">https://people.cs.aau.dk/~matteo/images/exemplar-book-cover.png</span>
</div>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/CreativeWorkSeries">
<em><span itemprop="name">Synthesis Lectures on Data Management</span></em>
</span>, <span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/PublicationVolume"> <span itemprop="volumeNumber">10</span></span> <span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/PublicationIssue">
(<span itemprop="issueNumber">4</span>)
(<time datetime="2018-12-01" itemprop="datePublished">December, 2018</time>):
</span>
pages: <span itemprop="numberOfPages">164</span>.
<span itemprop="publisher" itemscope="" itemtype="http://schema.org/Organization">
<span itemprop="legalName">Morgan & Claypool Publishers</span>
</span><br>
ISBN: <span itemprop="isbn">9781681734552</span>.
</blockquote>
<pre><code class="lang-bibtex">
@book{lissandrini2018data,
title={Data Exploration Using Example-Based Methods},
author={Lissandrini, Matteo and
Mottin, Davide and
Palpanas, Themis and
Velegrakis, Yannis},
series={Synthesis Lectures on Data Management},
volume={10},
number={4},
pages={1--164},
year={2018},
publisher={Morgan \& Claypool Publishers}
}
</code></pre>
<!-- 10-2023 -->
Beyond Macrobenchmarks: Microbenchmark-based Graph Database Evaluation2018-12-01T00:00:00Zhttps://people.cs.aau.dk/publications/journal/2018-vldb-gdb.html<h1 id="beyond-macrobenchmarks%3A-microbenchmark-based-graph-database-evaluation" tabindex="-1">Beyond Macrobenchmarks: Microbenchmark-based Graph Database Evaluation</h1>
<h2 id="matteo-lissandrini%2C-martin-brugnara%2C-yannis-velegrakis" tabindex="-1">Matteo Lissandrini, Martin Brugnara, Yannis Velegrakis</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/VLDB18-microbenchmark-lissandrini.pdf">PDF</a>)
or see <a href="https://people.cs.aau.dk/~matteo/pdf/VLDB19-microbenchmark-poster.pdf">the Poster (PDF)</a>
and <a href="https://people.cs.aau.dk/~matteo/pdf/VLDB19-microbenchmark-SLIDES.pdf"> the Slides (PDF)</a>
</p>
<p>
You can also <a class="attachment" href="https://people.cs.aau.dk/~matteo/gdb.html">read more about the project</a>.
</p><p>
The final authenticated version is available online at <a href="https://doi.org/10.14778/3297753.3297759">10.14778/3297753.3297759</a>.
</p></section>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
Despite the increasing interest in graph databases their requirements and specifications are not yet fully understood by everyone, leading to a great deal of variation in the supported functionalities and the achieved performances.
We provide a comprehensive study of the existing graph database systems.
We introduce a novel micro-benchmarking framework that provides insights on their performance that go beyond what macro-benchmarks can offer.
We have identified and included in our framework the largest set of queries and operators, we have evaluated the systems on both synthetic and real data, from different domains, and at much larger scales of any previous work.
We materialized our evaluation framework in an open-source suite that can be easily extended with new datasets, systems, or queries.
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Brugnara, Martin</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Velegrakis, Yannis</span>.
</span>
<br>
“<span itemprop="headline name">Beyond Macrobenchmarks: Microbenchmark-based Graph Database Evaluation</span>.”
<br>
<div class="hidden">
<time datetime="2018-12-01" itemprop="datePublished">December, 2018</time>
<span itemprop="image">https://people.cs.aau.dk/~matteo/images/gdb-logo.png</span>
</div>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Proceedings of the VLDB Endowment</span></em>
</span>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/PublicationVolume">
<span itemprop="volumeNumber">12</span>
</span>, <span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/PublicationIssue">
(<span itemprop="issueNumber">4</span>)
(<time datetime="2018-12-01" itemprop="datePublished">December, 2018</time>):
</span>
<span itemprop="pageStart">390</span>-<span itemprop="pageEnd">403</span>.
</blockquote>
<pre><code class="lang-bibtex">
@article{Lissandrini:2018:GDB,
author = {Lissandrini, Matteo and Brugnara, Martin and Velegrakis, Yannis},
title = {Beyond Macrobenchmarks: Microbenchmark-based Graph Database Evaluation},
journal = {PVLDB},
issue_date = {December 2018},
volume = {12},
number = {4},
month = dec,
year = {2018},
pages = {390–-403},
numpages = {14},
url = {https://doi.org/10.14778/3297753.3297759},
doi = {10.14778/3297753.3297759},
publisher = {VLDB Endowment}
}
</code></pre>
Starting in a new Role!2018-09-15T00:00:00Zhttps://people.cs.aau.dk/notes/new-position.html<!-- 10-2023 -->
<section class="intro secsubhead">
<p>
I’m very excited to start as a postdoctoral researcher in the department of Computer Science, at Aalborg University.
</p>
<p>I will continue my research in the field of <b>Exploratory Analytics for Information Graphs.</b>
The goal will be to study new data exploration functionalities by implementing dedicated Business Intelligence (BI) and Analytics operators enabled by the Exemplar Query paradigm.
</p>
</section>
<figure class="show-logo body-figure">
<img class="img-thumbnail" alt="AAU + Exemplar Search + RDF" src="https://people.cs.aau.dk/~matteo/images/xolap-search.png">
</figure>
<p>To extract relevant and actionable knowledge from rich information graph, analysts and researchers must face large amounts of interlinked information that is produced and shared by different actors with an unpredictable and heterogeneous structure.
<b>This is particulary evident in the realm of Linked Open Data (LOD).</b></p>
<p>Moreover, the typical gateway to access these repositories are specialized query languages that are usually challenging to use to non-expert users.
This drastically reduces the real accessibility to these assets, which is in clear contrast to the goal by which LOD have been created in the first place.</p>
<p>Hence, there is a primary need to support the understanding of the information LOD repositories represent and to provide easy access to their content.
To address this need we aim at providing a data exploration system for LOD.</p>
X2Q: Your Personal Example-based Graph Explorer2018-08-01T00:00:00Zhttps://people.cs.aau.dk/publications/demo/2018-vldb-exq.html<h1 id="x2q%3A-your-personal-example-based-graph-explorer" tabindex="-1"><a href="http://www.vldb.org/pvldb/vol11/p2026-lissandrini.pdf">X2Q: Your Personal Example-based Graph Explorer</a></h1>
<h2 id="matteo-lissandrini%2C-davide-mottin%2C-themis-palpanas%2C-yannis-velegrakis" tabindex="-1">Matteo Lissandrini, Davide Mottin, Themis Palpanas, Yannis Velegrakis</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/VLDB18-x2q.pdf">PDF</a>)
or watch <a class="attachment" href="https://people.cs.aau.dk/~matteo/files/X2Q-Demo.avi">the Demo Video (`9.3MB`, `.avi`)</a>
</p><p>also <a class="attachment" href="https://www.youtube.com/watch?v=A1_dKvX5ZRk" title="Watch the Demo Video">on streaming (YouTube)</a></p>
<p></p><p>
The final authenticated version is available online at <a href="https://doi.org/10.14778/3229863.3236251">10.14778/3229863.3236251</a>.
</p></section>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
Exploring knowledge graphs can be a daunting task for any user, expert or novice.
This is due to the complexity of the schema or because they are unfamiliar with the contents of the data, or even because they do not know precisely what they are looking for.
For the same reason there is a significant demand for exploratory methods for this kind of data.
We propose X2Q, a system that facilitates the exploration of knowledge graphs with a hands-on approach.
X2Q embodies the flexible multi-exemplar query paradigm, in which easy to express examples serve as the basis for formulating sophisticated, and hard to express queries.
Our system helps building examples in an interactive fashion, by showing results of the partial exemplar query as well as suggestions for improving the current examples.
Then, the user feedback is incorporated in our scores to filter the irrelevant suggestions upfront.
X2Q returns answers in real-time on Freebase, one of the largest available knowledge graphs.
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Mottin, Davide</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Palpanas, Themis</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Velegrakis, Yannis</span>.
</span>
<br>
“<span itemprop="headline name">X2Q: Your Personal Example-based Graph Explorer</span>.”
<div class="hidden">
<time datetime="2018-01-01" itemprop="datePublished">August, 2018</time>
</div>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Proceedings of the Conference in Very Large Databases (PVLDB), 11 (12): 2018</span></em>
</span> (pp. <span itemprop="pageStart">2026</span>-<span itemprop="pageEnd">2029</span>).
</blockquote>
<pre><code class="lang-bibtex">
@inproceedings{Lissandrini:2018:X2Q,
author = {Lissandrini, Matteo and Mottin, Davide and Palpanas, Themis and Velegrakis, Yannis},
title = {X2Q: Your Personal Example-based Graph Explorer},
booktitle = {Proceedings of the Conference in Very Large Databases (PVLDB), 11 (12) 2018},
series = {VLDB '18},
year = {2018},
location = {Rio, Brazil},
pages = {2026--2029},
numpages = {4},
publisher = {ACM},
doi = {10.14778/3229863.3236251},
address = {New York, NY, USA},
keywords = {exploration, exemplar queries, labeled graphs, query paradigms},
}
</code></pre>
T.S. Eliot on Knowledge and Exploration2018-07-05T00:00:00Zhttps://people.cs.aau.dk/notes/quotes-ts-eliot.html<!-- 10-2023 -->
<blockquote class="important">
<p>Where is the Life we have lost in living?<br>
Where is the wisdom we have lost in knowledge?<br>
Where is the knowledge we have lost in information?<br>
</p>
<footer>
— <cite><a href="https://en.wikiquote.org/wiki/T._S._Eliot">Choruses from“The Rock” (1934)</a> by T. S. Eliot</cite>
</footer>
</blockquote>
<blockquote class="important">
<p>We shall not cease from exploration<br>
And the end of all our exploring<br>
Will be to arrive where we started<br>
And know the place for the first time.<br>
</p>
<footer>
— <cite><a href="https://en.wikiquote.org/wiki/Four_Quartets#Little_Gidding_(1942)">“Little Gidding” (1942)</a> by T. S. Eliot</cite>
</footer>
</blockquote>
Enabling Access to and Exploration of Information Graphs2018-04-27T00:00:00Zhttps://people.cs.aau.dk/notes/thesis.html<!-- 10-2023 -->
<section class="intro secsubhead">
<p>
Exploratory search is the new frontier of information consumption as it goes well beyond simple <em>lookups</em>.
Information repositories are ubiquitous and grow larger every day, and automated search systems help users find information in such collections.
</p>
</section>
<h3 id="the-topic-of-my-thesis-dissertation" tabindex="-1">The topic of my Thesis Dissertation</h3>
<p>To extract knowledge from these repositories, the common ``query lookup'' retrieval paradigm accepts a set of specifications (the query) that describes the objects of interest and then collects such objects.
Yet, the query lookup retrieval paradigms commonly in use are no more sufficient to support complex information needs, as they can only provide candidate starting points, but do not help the user in expanding their knowledge.
To ease access and consumption of rich information repositories, we address the crucial problem of data exploration.
Exploratory tasks match the natural need for finding answers to open-ended information needs within an unfamiliar environment.</p>
<p>In particular, in this dissertation, we focus on enabling access to and exploration of rich information graphs.
Within businesses, organizations, and among researchers, data is produced in many forms, large volumes, and different contexts.
As a consequence of this heterogeneity, many applications find more useful modelling their datasets with the graph model, where information is represented with entities (nodes) and relationships (edges).
Those are the data graphs, the graph databases, the knowledge graphs, or more generally information graphs.
The richness of their schema and of their content makes it challenging for users to express appropriate queries and retrieve the desired results.
Hence, to allow an effective exploration of a graph, we require:
(i) an expressive <strong>query paradigm</strong>,
(ii) an intuitive <strong>query mechanism</strong>,
and (iii) an appropriate <strong>storage and query processing system</strong>.
In this work, we address these three requirements.</p>
<p>An exploratory query should be simple enough to avoid complicate declarative languages (such as SQL or SPARQL), and at the same time, it should retain the flexibility and expressiveness of such languages.
For this reason, with respect to the query paradigm, we introduce the notion of <strong>exemplar queries</strong> and propose extensions to handle multiple incomplete examples.
An exemplar query is a query method in which the user, or the analyst, circumvents query languages by using examples as input.
In particular, the solution we design allows flexible matching in the case of incomplete or partially specified examples.</p>
<p>Moreover, to enable this query paradigm, there is the need for interactive systems that implement an incremental query-constructions mechanism and interactive explorations.
To address this need, we study algorithms and implementations based on pseudo-relevance feedback for <strong>exemplar query suggestion</strong>, along with an in-depth study of their effectiveness.</p>
<p>Finally, as there exist many graph databases, high heterogeneity can be observed in the functionalities and performances of these systems.
We provide an exhaustive evaluation methodology and a comprehensive study of the existing systems that allow to understand their capabilities and limitations.
In particular, we design a novel micro-benchmarking framework for the assessment of the functionalities of some graph databases among the most prominent in the area and provide detailed insights on their performance.</p>
Using Schema.org Notation for a Personal Academic Page2017-12-23T00:00:00Zhttps://people.cs.aau.dk/notes/schema-org.html<!-- 10-2023 -->
<section class="intro secsubhead">
<p>
Using Schema.org Notation to enrich your html with structured data will provide additional information to a search engine, and will help it display the right relevant information on their result page.
For an academic personal page this could help people find your contact information, affiliation, and your publications as well.
</p>
</section>
<h2 id="schema.org" tabindex="-1"><a href="http://schema.org/">Schema.org</a></h2>
<p>Semantic notations for html documents have a long story.
I've always been a <a href="https://microformats.org/">microformats</a> aficionado, but apparently Google is ignoring those.
I'm honestly confused, you can read <a href="https://softwareengineering.stackexchange.com/a/166669">some more information around</a>, but for now I will (probably cowardly) proceed with the most sponsored way.</p>
<blockquote>
<p>
Schema.org is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond.
</p>
<p>
Schema.org vocabulary can be used with many different encodings, including RDFa, Microdata and JSON-LD.
These vocabularies cover entities, relationships between entities and actions, and can easily be extended through a well-documented extension model.
Over 10 million sites use Schema.org to markup their web pages and email messages.
Many applications from Google, Microsoft, Pinterest, Yandex and others already use these vocabularies to power rich, extensible experiences.
</p>
<footer>
— <cite><a href="https://schema.org/">“Schema.org”</a></cite>
</footer>
</blockquote>
<p>The basic idea is write you content down in plain html, add some <del>bloated</del> additional markup, and when a search engine will parse your page, they will pick up this extra information, and possibly show it in a more prominent or readable way in the search-result page.</p>
<p>There is a long philosophical detour we could take, about the fact that this will actively disincentive people from visiting our website, and will just stop at the search engine result-page, but I'll leave that for another time.</p>
<p>In this case, one can either adopt the JSON-LD format, or the <a href="http://schema.org/">Schema.org</a> format.
I am in favor of enriching the document markup, and against adding javascript metadata to a page.</p>
<p>I will show two examples: the personal contact snippet, and the publication.
They may not be 100% correct, but the <a href="https://search.google.com/structured-data/testing-tool">Google Structured-Data Validator</a>, and they are currently on-line on this very website.</p>
<p>In general every self-contained piece of information is identified by an <code>itemscope</code>, and each item scope is characterized by an <code>itemtype</code>.
For instance, if your markup is about a person, then you can look on <a href="https://schema.org/Person">schema.org/Person</a> and you will find all properties that you can mark-up.
Those should be html elements inside the parent element with <code>itemscope</code> and should be defined with an <code>itemprop</code> attribute/value pair.
For instance the full name <code><span itemprop="name">Matteo Lissandrini</span></code> or
<code><span itemprop="givenName">Matteo<span></code> along with <code><span itemprop="familyName">Lissandrini</span></code>.</p>
<p>Note that you can put other items inside each other.
For instance, you can have an <code>itemprop=address</code> which is also an item scope with type <code>itemtype="https://schema.org/PostalAddress"</code>.
You will see below.</p>
<h2 id="the-personal-contact" tabindex="-1">The Personal Contact</h2>
<p>This is from my home page, but I have removed a lot of things here, to make it more readable and focus only on the important bits.</p>
<p>
<b>NOTICE: DO NOT COPY PASTE! It won't work.</b><br>
The code below embeds some magic HTML character coded `‍`, this is a non-visible zero-width space, and allows my HTML code to display verbatim.
</p>
<pre><code>
<‍aside itemscope itemtype="https://schema.org/Person">
<‍figure>
<‍img itemprop="image" src="https://people.cs.aau.dk/~matteo/images/me.jpg" alt="Matteo Lissandrini's photo" />
<‍address>
<‍strong class="secsubhead">Contact:<‍/strong><‍br />
<‍strong itemprop="name">Matteo Lissandrini<‍/strong>
<‍a itemprop="url" href="https://people.cs.aau.dk/~matteo">https://people.cs.aau.dk/~matteo<‍/a>
<‍a itemprop="email" href="mailto:matteo@cs.aau.dk">matteo@cs.aau.dk<‍/a>
<‍div itemprop="address" itemscope itemtype="https://schema.org/PostalAddress">
<‍span itemprop="streetAddress">Selma Lagerløfs Vej, 300<‍br />Aalborg University<‍/span>
<‍span itemprop="postalCode">9220<‍/span> —
<‍span itemprop="addressLocality">Aalborg<‍/span><‍br />
<‍span itemprop="addressCountry">Denmark<‍/span>
<‍/div>
<‍/address>
<‍p>
<‍em itemprop="jobTitle">Assistant Professor<‍/em>
in the
<‍span itemprop="affiliation" itemscope itemtype="https://schema.org/Organization"
itemprop="name">
<‍a itemprop="url" href="https://www.cs.aau.dk/">
<‍span itemprop="name">Department of Computer Science<‍/span>
<‍/a>
<‍/span>
<‍/p>
<‍/aside>
</code></pre>
<h2 id="the-publication" tabindex="-1">The Publication</h2>
<p>This is one of my publications, the item type is <code>ScholarlyArticle</code> and it has pretty rich properties,
you can read them all on <a href="https://schema.org/ScholarlyArticle">schema.org/ScholarlyArticle</a>.</p>
<p>Important to note here is that the <a href="http://schema.org/">Schema.org</a> specification requires both a <code>name</code> and an <code>headline</code> for the article.
They are not really exploiting the <code>name</code> property apparently, so you can conflate them both with <code>itemprop="headline name"</code>.</p>
<p>
<b>NOTICE: DO NOT COPY PASTE! It won't work.</b><br>
The code below embeds some magic HTML character coded `‍`, this is a non-visible zero-width space, and allows my HTML code to display verbatim.
</p>
<pre><code>
<‍blockquote itemscope itemtype="https://schema.org/ScholarlyArticle">
<‍span itemprop="author" itemscope itemtype="https://schema.org/Person">
<‍span itemprop="name">Mottin, Davide<‍/span>;
<‍/span>
<‍span itemprop="author" itemscope itemtype="https://schema.org/Person">
<‍span itemprop="name">Lissandrini, Matteo<‍/span>;
<‍/span>
<‍span itemprop="author" itemscope itemtype="https://schema.org/Person">
<‍span itemprop="name">Velegrakis, Yannis<‍/span>;
<‍/span>
and
<‍span itemprop="author" itemscope itemtype="https://schema.org/Person">
<‍span itemprop="name">Palpanas, Themis<‍/span>.
<‍/span>
<‍br />
“<‍span itemprop="headline name">Exemplar Queries: A New Way of Searching<‍/span>.”
<‍div class="hidden">
<‍time datetime="2014-01-27" itemprop="datePublished">December, 2016<‍/time>
<‍span itemprop="image">https://disi.unitn.it/~lissandrini/images/xq-logo.png<‍/span>
<‍/div>
<‍div itemprop="isPartOf" itemscope itemtype="https://schema.org/Periodical">
<‍em><‍span itemprop="name">The VLDB Journal<‍/span><‍/em>
<‍/div>
<‍span itemprop="isPartOf" itemscope itemtype="https://schema.org/PublicationVolume">
<‍span itemprop="volumeNumber">25<‍/span>
<‍/span>,
<‍span itemprop="isPartOf" itemscope itemtype="https://schema.org/PublicationIssue">
(<‍span itemprop="issueNumber">6<‍/span>)
(<‍time datetime="2016-12-01" itemprop="datePublished">December, 2016<‍/time>):
<‍/span>
<‍span itemprop="pageStart">741<‍/span>-<‍span itemprop="pageEnd">765<‍/span>.
<‍/blockquote>
</code></pre>Multi-Example Search in Rich Information Graphs2017-12-23T00:00:00Zhttps://people.cs.aau.dk/publications/conference/2017-icde-multexq.html<h1 id="multi-example-search-in-rich-information-graphs" tabindex="-1">Multi-Example Search in Rich Information Graphs</h1>
<h2 id="matteo-lissandrini%2C-davide-mottin%2C-themis-palpanas%2C-yannis-velegrakis" tabindex="-1">Matteo Lissandrini, Davide Mottin, Themis Palpanas, Yannis Velegrakis</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/ICDE18-mexq-SLIDES_export.pdf">PDF</a>)
or get <a href="https://people.cs.aau.dk/~matteo/pdf/ICDE18-mexq-poster.pdf">the Poster (PDF)</a> and <a href="https://people.cs.aau.dk/~matteo/pdf/ICDE18-mexq-export.pdf">Slides (PDF)</a>
</p><p>
The final authenticated version is available online at <a href="https://doi.org/10.1109/ICDE.2018.00078">10.1109/ICDE.2018.00078</a>.
</p></section>
<h3 id="abstract-%3A" tabindex="-1">Abstract :</h3>
<section class="intro secsubead">
In rich information spaces, it is often hard for users to formally specify the characteristics of the desired answers, either due to the complexity of the schema or of the query language, or even because they do not know exactly what they are looking for.
Exemplar queries constitute a query paradigm that overcomes those problems, by allowing users to provide examples of the elements of interest in place of the query specification.
In this paper, we propose a general approach where the user-provided example can comprise several partial specification fragments,
where each fragment describes only one part of the desired result.
We provide a formal definition of the problem, which generalizes existing formulations for both the relational and the graph model.
We then describe exact algorithms for its solution for the case of information graphs, as well as top-k algorithms.
Experiments on large real datasets demonstrate the effectiveness and efficiency of the proposed approach.
</section>
<!-- ### Presented <a href="https://icde2018.org/index.php/program/research-track-on-wednesday/#research-16">Wed. April 19th, in room 21.2.32</a> -->
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Mottin, Davide</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Palpanas, Themis</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Velegrakis, Yannis</span>.
</span>
<br>
“<span itemprop="headline name">Multi-Example Search in Rich Information Graphs</span>.”
<br>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Proceedings of the 34rd IEEE International Conference on Data Engineering, ICDE 2018</span></em>
</span> (pp. <span itemprop="pageStart">809</span>-<span itemprop="pageEnd">820</span>).
<div class="hidden">
<time datetime="2017-12-23" itemprop="datePublished">December, 2017</time>
<span itemprop="image">https://people.cs.aau.dk/~matteo/images/xq-logo.png</span>
</div>
</blockquote>
<pre><code class="lang-bibtex">
<!-- -->
@inproceedings{Lissandrini:ICDE18,
author = {Lissandrini, Matteo and Mottin, Davide and Velegrakis, Yannis and Palpanas, Themis},
title = {{Multi-Example Search in Rich Information Graphs}},
booktitle = {34rd {IEEE} International Conference on Data Engineering, {ICDE} 2018, Paris, France, April 16-19, 2018},
pages = {809--820},
year = {2018},
organization={IEEE},
url = {https://doi.org/10.1109/ICDE.2018.00078},
doi = {10.1109/ICDE.2018.00078}
}
<!-- -->
</code></pre>
Beyond Frequencies: Graph Pattern Mining in Multi-weighted Graphs2017-11-18T00:00:00Zhttps://people.cs.aau.dk/publications/conference/2017-edbt-relevant-patterns.html<h1 id="beyond-frequencies%3A-graph-pattern-mining-in-multi-weighted-graphs" tabindex="-1">Beyond Frequencies: Graph Pattern Mining in Multi-weighted Graphs</h1>
<h2 id="giulia-preti%2C-matteo-lissandrini%2C-davide-mottin%2C-yannis-velegrakis" tabindex="-1">Giulia Preti, Matteo Lissandrini, Davide Mottin, Yannis Velegrakis</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/EDBT18-resum.pdf">PDF</a>)
or <a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/EDBT18-resum-poster.pdf">See the poster (PDF)</a>
</p><p>
The final authenticated version is available online at <a href="https://doi.org/10.5441/002/edbt.2018.16">10.5441/002/edbt.2018.16</a>.
</p></section>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
Graph pattern mining aims at identifying structures that appear frequently in large graphs, under the assumption that frequency signifies importance.
Several measures of frequency have been proposed that respect the <em>apriori</em> property, pivotal to an efficient search of the patterns.
This property states that the number of appearances of a pattern in a graph cannot be more than the frequency of any of its sub-patterns.
In real life, there are many graphs with weights on nodes and/or edges.
For these graphs, it is fair that the importance (score) of a pattern is determined not only by the number of its appearances, but also by the weights on the nodes/edges of those appearances.
Scoring functions based on the weights do not generally satisfy the apriori property, thus forcing many approaches to employ other, less efficient, pruning strategy to speed up the computation.
The problem becomes even more challenging in the case of multiple weighting functions that assign different weights to the same nodes/edges.
In this work, we provide efficient and effective techniques for mining patterns in multi-weight graphs.
We devise both an exact and an approximate solution. The first is characterized by intelligent storage and computation of the pattern scores, while the second is based on the aggregation of similar weighting functions to allow scalability and avoid redundant computations.
Both methods adopt a scoring function that respects the apriori property, and thus they can rely on effective pruning strategies.
We present a set of experiments to illustrate the efficiency and effectiveness of our approach.
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Preti, Giulia</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Mottin, Davide</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Velegrakis, Yannis</span>.
</span>
<br>
“<span itemprop="headline name">Beyond Frequencies: Graph Pattern Mining in Multi-weighted Graphs</span>.”
<br>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Proceedings of the 21th International Conference on Extending Database Technology, EDBT 2018</span></em>
</span> (<span itemprop="pageStart">169</span>-<span itemprop="pageEnd">180</span>).
<div class="hidden">
<time datetime="2017-11-18" itemprop="datePublished">November, 2017</time>
<span itemprop="image">https://people.cs.aau.dk/~matteo/images/graph-logo.png</span>
</div>
</blockquote>
<pre><code class="lang-bibtex">
@inproceedings{PretiLMV18,
author = {Giulia Preti and
Matteo Lissandrini and
Davide Mottin and
Yannis Velegrakis},
title = {Beyond Frequencies: Graph Pattern Mining in Multi-weighted Graphs},
booktitle = {Proceedings of the 21th International Conference on Extending Database Technology, {EDBT} 2018, Vienna, Austria, March 26-29, 2018.},
pages = {169--180},
year = {2018},
url = {https://doi.org/10.5441/002/edbt.2018.16},
doi = {10.5441/002/edbt.2018.16}
}
</code></pre>
New Trends on Exploratory Methods for Data Analytics2017-08-30T00:00:00Zhttps://people.cs.aau.dk/publications/tutorial/2017-vldb-exemplar-tutorial.html<h1 id="new-trends-on-exploratory-methods-for-data-analytics" tabindex="-1">New Trends on Exploratory Methods for Data Analytics</h1>
<h2 id="davide-mottin%2C-matteo-lissandrini%2C-yannis-velegrakis%2C-themis-palpanas" tabindex="-1">Davide Mottin, Matteo Lissandrini, Yannis Velegrakis, Themis Palpanas</h2>
<div class="box-special">
<p>
The content of this tutorial has been expanded in a book: <br><a href="https://people.cs.aau.dk/~matteo/publications/book/2018-mc-exemplar.html" class="call-to-action">Find out more!</a>
</p>
<p>You can also visit the official website: <a href="https://data-exploration.ml/">data-exploration.ml</a></p>
</div>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
<p>
Data usually comes in a plethora of formats and dimensions,
rendering the exploration and information extraction
processes cumbersome.
Thus, being able to cast exploratory queries in the data with the intent of having an immediate glimpse on some of the data properties is becoming crucial.
An exploratory query should be simple enough to avoid
complicate declarative languages (such as SQL) and mechanisms,
and at the same time retain the flexibility and expressiveness
of such languages. Recently, we have witnessed a rediscovery of the so called example-based methods, in which the user, or the analyst circumvent query languages by using examples as input.
</p>
<p>
An example is a representative
of the intended results, or in other words, an item from the
result set. Example-based methods exploit inherent characteristics
of the data to infer the results that the user has in
mind, but may not able to (easily) express.
</p>
<p>
They can be useful both in cases where a user is looking for information
in an unfamiliar dataset, or simply when she is exploring
the data without knowing what to find in there. In this
tutorial, we present an excursus over the main methods for
exploratory analysis, with a particular focus on examplebased
methods. We show how different data types require
different techniques, and present algorithms that are specifically
designed for relational, textual, and graph data.
</p>
<p>
<b>Read the full overview of the</b> <a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/VLDB17-p893-mottin-errata.pdf" title="Download a digital copy">Tutorial (PDF)</a>
</p>
</section>
<h3 id="the-tutorial-covers%3A" tabindex="-1">The tutorial covers:</h3>
<ol>
<li><strong>Example methods in relational databases</strong></li>
<li><strong>Example methods in textual data</strong></li>
<li><strong>Example methods in graphs</strong></li>
<li><strong>Learning methods based on examples</strong></li>
</ol>
<p><a class="call-to-action" href="https://people.cs.aau.dk/~matteo/slides/VLDB2017-ExploratoryMethods.pdf">Download The Slides (PDF)</a> <a class="call-to-action" href="https://people.cs.aau.dk/~matteo/slides/VLDB2017-ExploratoryMethods.zip">Download The Slides (PPT)</a></p>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Mottin, Davide</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Velegrakis, Yannis</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Palpanas, Themis</span>.
</span>
<br>
“<span itemprop="headline name">New trends on exploratory methods for data analytics</span>.”
<br>
<div class="hidden">
<time datetime="2017-08-27" itemprop="datePublished">August, 2017</time>
<span itemprop="image"><%- @site.url%>/images/xq-logo.png</span>
</div>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Proceedings of the VLDB Endowment</span></em>
</span>, <span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/PublicationVolume">
<span itemprop="volumeNumber">10</span>
</span>,
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/PublicationIssue">
(<span itemprop="issueNumber">12</span>)
(<time datetime="2017-08-27" itemprop="datePublished">August, 2017</time>):
</span>
<span itemprop="pageStart">1977</span>-<span itemprop="pageEnd">1980</span>.
</blockquote>
<pre><code class="lang-bibtex">
@article{Mottin:2017:NTE:3137765.3137824,
author = {Mottin, Davide
and Lissandrini, Matteo
and Velegrakis, Yannis
and Palpanas, Themis},
title = {New Trends on Exploratory Methods for Data Analytics},
journal = {Proceedings of the VLDB Endowment},
issue_date = {August 2017},
volume = {10},
number = {12},
month = aug,
year = {2017},
issn = {2150-8097},
pages = {1977--1980},
numpages = {4},
doi = {10.14778/3137765.3137824},
acmid = {3137824},
publisher = {VLDB Endowment},
}
</code></pre>
<!-- <%= @moment().format('YYYY') %> -->
An Evaluation Methodology and Experimental Comparison of Graph Databases2017-04-30T00:00:00Zhttps://people.cs.aau.dk/publications/report/2017-lissandrini-techreport.html<h1 id="an-evaluation-methodology-and-experimental-comparison-of-graph-databases" tabindex="-1">An Evaluation Methodology and Experimental Comparison of Graph Databases</h1>
<h2 id="matteo-lissandrini%2C-martin-brugnara%2C-yannis-velegrakis" tabindex="-1">Matteo Lissandrini, Martin Brugnara, Yannis Velegrakis</h2>
<h3 id="abstract-%3A" tabindex="-1">Abstract :</h3>
<section class="intro secsubead">
</section>
<pre><code>
@techreport{lissandrini:tech,
title = "An Evaluation Methodology and Experimental Comparison of Graph Databases",
author = {Lissandrini, Matteo and Brugnara, Martin and Velegrakis, Yannis},
group = {Data Management Group,DISI},
howpublished = {\url{https://disi.unitn.it/~lissandrini/pdf/lissandrini-techreport-gdb.pdf}},
url = {\url{https://disi.unitn.it/~lissandrini/pdf/lissandrini-techreport-gdb.pdf}},
year = {2017},
note = {\url{https://disi.unitn.it/~lissandrini/pdf/lissandrini-techreport-gdb.pdf}},
institution = {University of Trento},
month = {04},
Date-Added = {2017-04-29 12:00:00},
Date-Modified = {2017-04-30 17:00:00}
}
</code></pre>The Freebase ExQ Data Dump2017-03-15T00:00:00Zhttps://people.cs.aau.dk/notes/freebase-data-dump.html<!-- 10-2023 -->
<section class="intro secsubhead">
<p>
We share here <b>The Freebase ExQ Data Dump</b>: a cleaned version of the triplets from the
<a href="https://en.wikipedia.org/wiki/Freebase_(database)">Freebase knowledge graph</a>, where metadata, and also (most of) extraneous relationships have been removed.
The dump is shared in a `machine` friendly format.
</p>
<p>
The extracted graph, after cleaning, is a directed unweighted multigraph, containining <code>72,407,365</code> nodes, and <code>306,733,220</code> edges with <code>4335</code> distinct edge labels.
</p>
</section>
<p>The complete dump "Freebase Triples" can be found at
<a href="https://developers.google.com/freebase/">developers.google.com/freebase</a>
and they are no longer updated after
<a href="https://groups.google.com/forum/#!topic/freebase-discuss/WEnyO8f7xOQ">the shutdown of the project</a>.</p>
<p><b>Freebase Data Dumps</b> are provided free of charge for any purpose by Google.
They are distributed, like Freebase itself, under the
<a href="http://creativecommons.org/licenses/by/2.5/">Creative Commons Attribution (aka CC-BY)</a>
and their use is subject to the
<a href="https://developers.google.com/freebase/terms">Freebase Terms of Service</a>.</p>
<p><b>The Freebase ExQ Data Dump</b> (this repository) is distributed under the same license; see below for citing this work.
This dataset had been used in the
<a href="https://people.cs.aau.dk/~matteo/exemplar.html">Exemplar Query project</a>.</p>
<h2 id="reference-this-dataset" tabindex="-1">Reference this dataset</h2>
<p>This dataset is called <b>The Freebase ExQ Data Dump</b>.
If you use this dataset, generate a subsample, or test on this dataset, please use the following reference and link to
<a href="https://people.cs.aau.dk/~matteo/exemplar.html">https://people.cs.aau.dk/~matteo/exemplar.html</a>.</p>
<blockquote>
<p>
Davide Mottin, Matteo Lissandrini, Yannis Velegrakis, Themis Palpanas.
"Exemplar Queries: A New Way of Searching."
The VLDB Journal (2016) 25: 741--765.
</p>
</blockquote>
<pre><code class="lang-bibtex">
@article{Mottin:2016:EQN:3016770.3016789,
author = {Mottin, Davide and Lissandrini, Matteo and Velegrakis, Yannis and Palpanas, Themis},
title = {Exemplar Queries: A New Way of Searching},
journal = {The VLDB Journal},
issue_date = {December 2016},
volume = {25},
number = {6},
month = dec,
year = {2016},
issn = {1066-8888},
pages = {741--765},
numpages = {25},
url = {https://doi.org/10.1007/s00778-016-0429-2},
doi = {10.1007/s00778-016-0429-2},
acmid = {3016789},
publisher = {Springer-Verlag New York, Inc.},
address = {Secaucus, NJ, USA},
keywords = {Exemplar query, Knowledge base, Knowledge graph, Query answering},
}
</code></pre>
<h2 id="content" tabindex="-1">Content</h2>
<p>The dump consists of the following files:</p>
<ul>
<li>
<p><code>freebase-sout.graph</code> <strong>(2GB)</strong>: edges triplets (ordered by the source id)</p>
<ul>
<li>each line is a space-separated triplet <code>source</code> <code>dest</code> <code>label</code>, representing a single edge</li>
<li>edges are sorted by source, and thus a scan in order will give all the outgoing edges of a node</li>
<li><code>source</code> and <code>dest</code> are long integers derived from the Freebase <code>mid</code></li>
</ul>
</li>
<li>
<p><code>freebase-labels.tsv</code>: list of TAB separated 4-tuples, each of which contains:</p>
<ul>
<li>Label ID (Long),</li>
<li>Number of edges with that label,</li>
<li>Freebase official edge label,</li>
<li>tentative human readable label</li>
</ul>
</li>
<li>
<p><code>freebase-nodes-in-out-name.tsv</code> <strong>(802MB)</strong>: list of TAB separated 4-tuples, each of which contains:</p>
<ul>
<li>Node ID (Long)</li>
<li>Node InDegree (could be approximate)</li>
<li>Node OutDegree (could be approximate)</li>
<li>tentative human readable label</li>
</ul>
</li>
<li>
<p><code>freebase-topics.tsv</code>: list of TAB separated values, each line contains:</p>
<ul>
<li>topic name : defined as the first fragment of the edge label</li>
<li>topic frequency : number of edges belonging to this topic;
note that <code>>141</code> million edges belong to type instances (like <code>isA</code> relationships)</li>
</ul>
</li>
<li>
<p><code>org-subsample</code> <strong>(34MB)</strong>: a subsample of Freebase for a selection of domains:</p>
<ul>
<li><code>freebase-org-subsample-sout.graph</code> contains a portion of 4.3M edges from <code>freebase-sout.graph</code></li>
<li><code>selected_labels.tsv</code> lists a portion of <code>freebase-labels.tsv</code>: only edges with labels in this list appear in the subsample</li>
</ul>
</li>
<li>
<p>directory <code>scripts</code>: contains</p>
<ul>
<li><code>mid2long</code> converts Freebase mids to long values, e.g., from <code>/m/0gwsd6y</code> to <code>89546883877148</code></li>
<li><code>long2mid</code> converts Freebase long ids to mids</li>
<li><code>extract_domain.py</code> extracts subgraphs of Freebase given a topic name from <code>freebase-topics.tsv</code>. Requires <code>python 2.7</code>, and <code>networkx</code> if you wish to keep only the largest connectected component</li>
</ul>
</li>
</ul>
<p><a href="https://drive.google.com/open?id=0BwX66B9ISrt4UTlnMDMwWXhEaWM" class="call-to-action" taget="_blank" rel="nofollow">Download</a> the files or part of them, they are stored on
<a href="https://drive.google.com/open?id=0BwX66B9ISrt4UTlnMDMwWXhEaWM">Google Drive</a>.</p>
<h2 id="the-org-subsample-subsample" tabindex="-1">The <code>org-subsample</code> Subsample</h2>
<p>The entire graph is cumbersome to process in many applications, expecially for testing purposes.
We generate a relatively small subsample of the graph, containing only a portion of about 4.3 million edges from the entire graph, with a total of 424 edge labels.</p>
<p>We generated a subsample from the following topics:</p>
<ul>
<li>business,</li>
<li>finance,</li>
<li>geography,</li>
<li>government,</li>
<li>military, and</li>
<li>organization.</li>
</ul>
<h2 id="information-about-node-ids" tabindex="-1">Information about Node IDs</h2>
<p>If you want to undersand what a node represents, then search in the file <code>freebase-nodes-in-out-name.tsv</code> for the corresponding node id.
If the search doesn't satisfy you, then you can use <code>grep</code> on the official data dump to search for its <code>mid</code> value (removing the first slash, and replacing the second with a dot).
So, if you care for node <code>89546883877148</code> and you want to search the official dump, convert it to the mid <code>/m/0gwsd6y</code>, replace the characters to obtain <code>m.0gwsd6y</code> and <code>grep</code> (<code>zgrep</code> on compressed file) the dump.</p>
<p>Mids have been converted into long numbers using the following code:</p>
<pre><code class="lang-java">
/**
* Convert a mid into a BigInteger since a mid is not more than "/m/"
* followed by lower-case letters, digits and _, so it is a base-32 code
* that can be easily converted to binary and then to bigint.
*
* ** NOTE ** Engineering version
* @param mid The original Freebase mid
* @return the converted number
* @throws NullPointerException
* @throws IndexOutOfBoundsException
*/
long convertMidToLong(String mid)
throws NullPointerException, IndexOutOfBoundsException {
String id = mid.substring(mid.lastIndexOf('/') + 1).toUpperCase();
long retval;
String number = "";
for (int i = 0; i < id.length(); i++) {
number = (int)id.charAt(i) + number;
}
retval = Long.valueOf(number);
return retval;
}
</code></pre>
<p>Given a long value one can obtain the Freebase mid with the following code</p>
<pre><code class="lang-java">
/**
* Opposite of <code>convertMidToBigInt</code>
* @param decimal
* @return
* @throws NullPointerException
* @throws IndexOutOfBoundsException
*/
String convertLongToMid(long decimal)
throws NullPointerException, IndexOutOfBoundsException {
String mid = "";
String decimalString = decimal + "";
for (int i = 0; i < decimalString.length(); i+= 2) {
mid = (char)Integer.parseInt(decimalString.substring(i, i + 2)) + mid;
}
return "/m/" + mid.toLowerCase();
}
</code></pre>
<h2 id="cleaning-criteria" tabindex="-1">Cleaning Criteria</h2>
<h3 id="metadata-relationships-in-freebase%3B-these-relationships-are-omitted" tabindex="-1">Metadata relationships in Freebase; these relationships are omitted</h3>
<ul>
<li><code>DOMAIN</code> <code>/type/domain</code></li>
<li><code>TOPIC</code> <code>/type/type</code></li>
<li><code>ENTITY</code> <code>/common/topic</code></li>
<li><code>PROPERTY</code> <code>/type/property</code></li>
</ul>
<h3 id="media-and-contextual-information-not-interesting-in-the-knowledge-graph" tabindex="-1">Media and contextual information not interesting in the knowledge graph</h3>
<p>For type relationships we keep only the <code>isA</code>, and not the reverse <code>hasInstance</code></p>
<pre><code class="lang-java">
/**
* Patterns to skip
* removes the line from the tsv dump matching the following patterns
*/
String SKIP_PATTERNS = ".*\\t/user.*|"
+ ".*\\t/freebase/(?!domain_category).*|"
+ ".*/usergroup/.*|"
+ ".*/permission/.*|"
+ ".*\\t/community/.*\\t.*|"
+ ".*\\t/type/object/type\\t.*|"
+ ".*\\t/type/domain/.*\\t.*|"
+ ".*\\t/type/property/(?!expected_type|reverse_property)\\b.*|"
+ ".*\\t/type/(user|content|attribution|extension|link|namespace|permission|reflect|em|karen|cfs|media).*|"
+ ".*\\t/common/(?!document|topic)\\b.*|"
+ ".*\\t/common/document/(?!source_uri)\\b.*|"
+ ".*\\t/common/topic/(description|image|webpage|properties|weblink|notable_for|article).*|"
+ ".*\\t/type/type/(?!domain|instance)\\b.*|"
+ ".*\\t/dataworld/.*\\t.*|"
+ ".*\\t/base/.*\\t.*"
;
</code></pre>
<h2 id="other-dumps" tabindex="-1">Other dumps</h2>
<p>If you are looking for other dumps, you can see the <strong>Freebase Easy</strong> at <a href="http://freebase-easy.cs.uni-freiburg.de/dump/">freebase-easy.cs.uni-freiburg.de</a>, which contains a snapshot of the dump of the Freebase data, which has been enriched with transitive closures, but also largely simplified (and pruned).</p>
<h2 id="feedback" tabindex="-1">Feedback</h2>
<p>If you have any feedback, suggestion, like edges to add/remove, or labels for nodes and edges, or suggested domains, please feel free to contact
<a href="https://people.cs.aau.dk/~matteo">Matteo Lissandrini</a>.</p>
Charting Word Co-occurences like a graph in D3.js2016-10-15T00:00:00Zhttps://people.cs.aau.dk/notes/words-graph.html<!-- 10-2023 -->
<script src="https://d3js.org/d3.v3.min.js"></script>
<section class="intro secsubhead">
<p>
D3.js is a JavaScript library for manipulating the DOM and create dynamic visualizations.
</p>
</section>
<h2 id="d3.js" tabindex="-1">D3.js</h2>
<p>D3 helps you bring data to life using HTML, SVG, and CSS.
D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation.</p>
<p>D3 allows you to bind arbitrary data to a Document Object Model (DOM), and then apply data-driven transformations to the document. For example, you can use D3 to generate an HTML table from an array of numbers. Or, use the same data to create an interactive SVG bar chart with smooth transitions and interaction.</p>
<h2 id="the-result" tabindex="-1">The Result</h2>
<figure class="body-figure large">
<svg width="760" height="300"></svg>
</figure>
<p>You can find the code <a href="https://gist.github.com/kuzeko/d9dd2acc5db47397508c1c7549b725bc">in this gihub gist</a>
This code parses the current page, for all <code><p></code> tags, sets letters to lowercase, strips stopwords, and punctuation.
Then draws the co-occurence graph.</p>
<script src="https://people.cs.aau.dk/~matteo/files/word-graph.js"></script>
“Learn the Fundamentals” — A Note to Self2016-04-04T00:00:00Zhttps://people.cs.aau.dk/notes/learn-the-fundamentals.html<!-- 10-2023 -->
<blockquote class="long-quote important">
<p>I graduated in Computer Science in the early 2000s.</p>
<p>
When I took a Databases class, NoSQL didn't exist.<br>
When I took a Computer Graphics class, OpenGL didn't support shaders. <br>
When I took a Computer Security class, no one knew about botnets yet. <br>
When I took an Artificial Intelligence class, deep learning didn't exist. <br>
When I took a Programming Languages class, reactive programming wasn't a «thing». <br>
When I took a Distributed Systems class, there was no Big Data or cloud computing. <br>
When I took an Operating Systems class, hypervisors didn't exist (in PCs at least). <br>
When I took a Networking class, there was no wifi in my laptop or internet in my phone. <br>
</p>
<p><strong>Learn the fundamentals. The rest will change anyway.</strong></p>
<p></p>
<footer>
— <cite><a href="https://twitter.com/hisham_hm/status/675845003709702144">“Learn the Fundamentals”</a> by <a href="https://twitter.com/hisham_hm">Hisham H. Muhammad</a></cite>
</footer>
</blockquote>
<section class="intro secsubhead">
<p>
I've felt this very true to my experience, and yet, more than 8 years ago, when people were giving me this very same advice, I was not believing them (very often, not always, luckily).
</p>
</section>
<p class="after-heading">
I realize this now, and with it, the irony of the fact that this must be true to so many people.
I wished I learnt some more fundamentals, and I wished I studied them better.
I am doing it now, from time to time, re-learning the fundamentals while pointing a condescending eye to my past-self.
</p>
<hr>
<p><a href="https://twitter.com/hisham_hm">Hisham H. Muhammad</a> is a PhD from the Pontifícia Universidade Católica do Rio de Janeiro, who is using his homegrown Linux distro, typing in Dvorak into its own text editor (<a href="https://twitter.com/hisham_hm/status/712018122849423360">cit.</a>)</p>
Upgrading HDFS and Hadoop 2.X on a Multi-node cluster with Ubuntu 14.04 2016-02-22T00:00:00Zhttps://people.cs.aau.dk/notes/upgrade-hadoop-on-ubuntu-14.html<!-- 10-2023 -->
<section class="intro secsubhead">
<p>
This guide shows step by step how to upgrade a <em>multi node cluster</em> with <strong>Hadoop and HDFS</strong> from <strong>version 2.4.1</strong> to <strong>version 2.7.2</strong> on <strong>Ubuntu 14.04</strong>.
These instructions make reference to
<a href="https://disi.unitn.it/~lissandrini/notes/installing-hadoop-on-ubuntu-14.html">the setup I described previously</a>.
In general, this should work for any upgrade within the <em>2.X</em> branch, but <u>I cannot guarantee that</u>.
</p>
<p><strong>Note: this is not a high-availability upgrade</strong>, I will shut down my hadoop/hdfs cluster as first thing, so maybe you want to consider
<a href="https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html">this page on the Apache Wiki</a>
instead, but I did not try it.
</p></section>
<p>I will assume you've something similar to what I described in
<a href="https://disi.unitn.it/~lissandrini/notes/installing-hadoop-on-ubuntu-14.html">my previous article</a>, so assume we have a 3 nodes cluster, my test case is the following (with IP addresses and shortnames) :</p>
<pre><code class="language-bash">10.10.10.104 mynode1
10.10.10.105 mynode2
10.10.10.106 mynode3
</code></pre>
<p class="after-heading">
<strong>Note:</strong> We assume the nodes in the cluster <strong>have the same hardware configuation</strong>, i.e., the same type of architecture.
</p>
<p>I also assume that the <code>hduser</code> is existing and authenticated on all machines, i.e., you should run</p>
<pre><code class="language-bash">sudo su - hduser
cd ~
</code></pre>
<p class="after-heading">
<strong>From now on, in the rest of this guide, all commands will be run as the `hduser`.</strong>
</p>
<h2 id="setup" tabindex="-1">Setup</h2>
<p class="after-heading">
Make sure you have everything up to date.
</p>
<pre><code class="language-bash">sudo apt-get update
sudo apt-get upgrade
</code></pre>
<p class="after-heading">
<strong>Repeat this installation procedure, up to this point, on every node you have in the cluster.</strong>
</p>
<p><strong>The following will be necessary only on the first node:</strong>
Then we start a screen to work remotely without fear of losing work if disconnected.</p>
<pre><code class="language-bash">screen -S installing
</code></pre>
<p class="after-heading">
<em>After the <code>-S</code> you can put whatever name for your sessions</em>
</p>
<p>First thing now is <strong>to stop the cluster</strong> and check that everything is quiet.</p>
<pre><code class="language-bash">start-dfs.sh
jps
</code></pre>
<p>So yes, if you are looking for a way to avoid downtime while you do this upgrade, <a href="https://google.com/">Google</a> suggested <a href="https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html">this page on the Apache Wiki</a>.</p>
<h2 id="compile-the-sources" tabindex="-1">Compile the Sources</h2>
<p class="after-heading">
The following steps will be needed only once on the primary node.
I was upgrading from <code>2.4.1</code> to <code>2.7.2</code> and for some reason the new version required a tool called <code>javah</code> (note the <code>h</code>) which was installed alongside with java, but placed in some other direcotry, so I need to change one line of the <code>.bashrc</code> and/or <code>.profile</code> configuration.
To be more precise, replace the line
</p>
<pre><code class="language-bash">export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
</code></pre>
<p>with the line</p>
<pre><code class="language-bash">export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:jre/bin/java::")
</code></pre>
<p class="after-heading">
so that the <code>JAVA_HOME</code> now points to <code>/usr/lib/jvm/java-8-oracle/</code>
instead of <code>/usr/lib/jvm/java-8-oracle/jre</code>.
</p>
<p>Then, download hadoop <code>2.X</code> stable, to do so you navigate in the <a href="https://www.apache.org/dyn/closer.cgi/hadoop/core">List of Mirrors</a> select one and decide what version to download.
With <code>wget</code> you can run something like the following for hadoop <code>2.7.2</code> :</p>
<pre><code class="language-bash">wget https://mirror.nohup.it/apache/hadoop/common/hadoop-2.7.2/hadoop-2.7.2-src.tar.gz
</code></pre>
<p>Once it has been downloaded, unpack it, enter the directory and compile</p>
<pre><code class="language-bash">tar -xvf hadoop-2.7.2-src.tar.gz
cd hadoop-2.7.2-src/
mvn package -Pdist,native -Dmaven.javadoc.skip=true -DskipTests -Dtar
</code></pre>
<p>Compiled files will be found in <code>hadoop-dist/target/hadoop-2.7.2.tar.gz</code> just put them in the home</p>
<pre><code class="language-bash">mv hadoop-dist/target/hadoop-2.7.2.tar.gz ~/
</code></pre>
<p>Now let's copy these files on the other nodes, e.g, from <code>mynode1</code> to <code>mynode2</code> and <code>mynode3</code></p>
<pre><code class="language-bash">scp ~/hadoop-2.7.2.tar.gz hduser@10.10.10.105:~/
scp ~/hadoop-2.7.2.tar.gz hduser@10.10.10.106:~/
</code></pre>
<h2 id="install-the-compiled-code" tabindex="-1">Install the Compiled Code</h2>
<p><strong>The following steps will be needed on all the machines</strong>
We unpack the compiled version and put it in <code>/usr/local</code>, alongside with the old version, and we replace the shortcut called <code>/usr/local/hadoop</code>, this will effectively replace your links, from the old software to the new.</p>
<pre><code class="language-bash">sudo tar -xvf ~/hadoop-2.7.2.tar.gz -C /usr/local/
sudo rm /usr/local/hadoop
sudo ln -s /usr/local/hadoop-2.7.2 /usr/local/hadoop
sudo chown -R hduser:hadoop /usr/local/hadoop-2.7.2
</code></pre>
<p>We also have to edit <code>hadoop-env.sh</code> files with for the same <code>$JAVA_HOME</code> variable, that they seem not able to set up properly, so we open the file in</p>
<pre><code class="language-bash">nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh
</code></pre>
<p>and around line 27 we can replace</p>
<pre><code class="language-bash">export JAVA_HOME=${JAVA_HOME}
</code></pre>
<p>with</p>
<pre><code class="language-bash">JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:jre/bin/java::")
</code></pre>
<p>If you want to be sure it worked, you can print some values, like</p>
<pre><code class="language-bash">echo $JAVA_HOME
echo $HADOOP_HOME
</code></pre>
<h2 id="set-up-all-the-config-files" tabindex="-1">Set up all the config files</h2>
<p>On <em>all the machines</em> we want to pass the configuration from the previous version to the new one.
So we need to copy over the files: <code>hdfs-site.xml</code>, <code>core-site.xml</code>, <code>yarn-site.xml</code>, and <code>slaves</code> to the current hadoop directory, hence, assuming we are copying from a previously installed
<code>2.4.1</code> version, installed in <code>/usr/local/hadoop-2.4.1</code> we run the command</p>
<pre><code class="language-bash"> cp /usr/local/hadoop-2.4.1/etc/hadoop/hdfs-site.xml \
/usr/local/hadoop-2.4.1/etc/hadoop/core-site.xml \
/usr/local/hadoop-2.4.1/etc/hadoop/yarn-site.xml \
/usr/local/hadoop-2.4.1/etc/hadoop/slaves \
/usr/local/hadoop/etc/hadoop/
</code></pre>
<h2 id="initialize-the-upgrade-of-hdfs" tabindex="-1">Initialize the Upgrade of HDFS</h2>
<p><strong>These commands will be used only on the main node, and only once</strong></p>
<p>If all went well we should be able to run the following command</p>
<pre><code class="language-bash">hadoop version
</code></pre>
<p>and obtain something like</p>
<pre><code class="language-bash">Hadoop 2.7.2
Subversion Unknown -r Unknown
Compiled by hduser on 2016-02-19T11:03Z
Compiled with protoc 2.5.0
From source with checksum d0fda26633fa762bff87ec759ebe689c
This command was run using /usr/local/hadoop-2.7.2/share/hadoop/common/hadoop-common-2.7.2.jar
</code></pre>
<p>Now the first step is to <strong>initialize the upgrade</strong>, which will translate the data stored and something else that I'm not really sure about (if you know, please let me know). So on the main node you start the cluster in <code>upgrade</code> mode by running:</p>
<pre><code class="language-bash">start-dfs.sh -upgrade
</code></pre>
<p>This will have for you a running hadoop cluster on the new sofware version, without - hopefully- compromising the data and allowing you a roll-back (if you really need it).</p>
<p>You can check this status also on the HDFS Web User Interface, opening the url <code>https://10.10.10.104:50070</code>, on the top appears a message in a blue box saying that an upgrade is in progress.</p>
<h2 id="test-the-data-on-the-cluster!" tabindex="-1">Test the Data on the Cluster!</h2>
<p><strong>These commands will be used only on the main node</strong></p>
<p>And if the preivious command didn't complain about anythign, we can check the content of the <code>hdfs</code> directory with</p>
<pre><code class="language-bash">hfs -ls /
</code></pre>
<p>so that we know if the data was lost (<strong>no, they are not!</strong>).
Maybe we can try and export some data on the local disk to check it.
Assume you have the file on the <code>hdfs</code> called <code>/datastore/my_file.txt</code>, you can get a local copy with</p>
<pre><code class="language-bash">hfs -copyToLocal /datastore/my_file.txt ./my_file.txt
</code></pre>
<p><em>Note that I'm using my alias for <code>hfs</code>, which is in the <code>.bashrc</code>/<code>.profile</code> file as:</em></p>
<pre><code class="language-bash">alias hfs="hdfs dfs"
</code></pre>
<p>You can find all of them <a href="https://disi.unitn.it/~lissandrini/notes/installing-hadoop-on-ubuntu-14.html#set-up-env-variables">on my previous tutorial</a>.</p>
<h2 id="finalize-the-upgrade" tabindex="-1">Finalize the Upgrade</h2>
<p>Once you are reasonably confident that this upgrade has to be finalized, then you run, <strong>only on the main node</strong>, the command</p>
<pre><code class="language-bash">hdfs dfsadmin -finalizeUpgrade
</code></pre>
<p>Check again the Web User Interface,<code>https://10.10.10.104:50070</code>, on the top the message about the upgrade in progress is disappeared.</p>
<p><strong>And you are done!</strong></p>
Exemplar Queries: A New Way of Searching2015-04-08T00:00:00Zhttps://people.cs.aau.dk/publications/journal/2016-vldbj-exemplar.html<h1 id="exemplar-queries%3A-a-new-way-of-searching" tabindex="-1">Exemplar Queries: A New Way of Searching</h1>
<h2 id="davide-mottin%2C-matteo-lissandrini%2C-themis-palpanas%2C-yannis-velegrakis" tabindex="-1">Davide Mottin, Matteo Lissandrini, Themis Palpanas, Yannis Velegrakis</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/vldbj-exemplarqueries.pdf">PDF</a>)
</p><p>
The final authenticated version is available online at <a href="https://doi.org/10.1007/s00778-016-0429-2">10.1007/s00778-016-0429-2</a>.
</p></section>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
Modern search engines employ advanced techniques that go beyond the structures that strictly satisfy the query conditions in an effort to better capture the user intentions.
In this work we introduce a novel query paradigm that considers a user query as an example of the data in which the user is interested.
We call these queries <em>exemplar queries</em>.
We provide a formal specification of their semantics and show that they are fundamentally different from notions like queries by example, approximate queries and related queries.
We provide an implementation of these semantics for knowledge graphs and present an exact solution with a number of optimizations that improve performance without compromising the result quality.
We study two different congruence relations, isomorphism and strong simulation, for identifying the answers to an exemplar query.
We also provide an approximate solution that prunes the search space and achieves considerably better time-performance with minimal or no impact on effectiveness.
The effectiveness and efficiency of these solutions with synthetic and real datasets are experimentally evaluated and the importance of exemplar queries in practice is illustrated.
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Mottin, Davide</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Velegrakis, Yannis</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Palpanas, Themis</span>.
</span>
<br>
“<span itemprop="headline name">Exemplar Queries: A New Way of Searching</span>.”
<br>
<div class="hidden">
<time datetime="2016-12-01" itemprop="datePublished">December, 2016</time>
<span itemprop="image">https://people.cs.aau.dk/~matteo/images/xq-logo.png</span>
</div>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">The VLDB Journal</span></em>
</span>, <span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/PublicationVolume"> <span itemprop="volumeNumber">25</span></span>, <span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/PublicationIssue">
(<span itemprop="issueNumber">6</span>)
(<time datetime="2016-12-01" itemprop="datePublished">December, 2016</time>):
</span>
<span itemprop="pageStart">741</span>-<span itemprop="pageEnd">765</span>.
</blockquote>
<pre><code class="lang-bibtex">
@article{Mottin:2016:EQN:3016770.3016789,
author = {Mottin, Davide and Lissandrini, Matteo and Velegrakis, Yannis and Palpanas, Themis},
title = {Exemplar Queries: A New Way of Searching},
journal = {The VLDB Journal},
issue_date = {December 2016},
volume = {25},
number = {6},
year = {2016},
issn = {1066-8888},
pages = {741--765},
numpages = {25},
url = { https://doi.org/10.1007/s00778-016-0429-2 },
doi = {10.1007/s00778-016-0429-2},
acmid = {3016789},
publisher = {Springer-Verlag New York, Inc.},
address = {Secaucus, NJ, USA},
keywords = {Exemplar query, Knowledge base, Knowledge graph, Query answering},
}
</code></pre>
Unleashing the Power of Information Graphs2014-12-01T00:00:00Zhttps://people.cs.aau.dk/publications/journal/2014-sigmod-record-exemplar.html<h1 id="unleashing-the-power-of-information-graphs" tabindex="-1">Unleashing the Power of Information Graphs</h1>
<h2 id="matteo-lissandrini%2C-davide-mottin%2C-dimitra-papadimitriou%2C-themis-palpanas%2C-yannis-velegrakis" tabindex="-1">Matteo Lissandrini, Davide Mottin, Dimitra Papadimitriou, Themis Palpanas, Yannis Velegrakis</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/SIGREC14-exemplarqueries.pdf">PDF</a>)
</p><p>
The final authenticated version is available online at <a href="https://doi.org/10.1145/2737817.2737822">10.1145/2737817.2737822</a>.
</p></section>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
Information graphs are generic graphs that model different types of information through nodes and edges.
Knowledge graphs are the most common type of information graphs in which nodes represent entities and edges represent relationships among them.
In this paper, we argue that exploitation of information graphs can lead into novel query answering capabilities that go beyond the existing capabilities of keyword search, and focus on one of them, namely, exemplar queries.
Exemplar queries is a recently introduced paradigm that treats a user query as an example from the desired result set.
In this paper, we describe the foundations of exemplar queries and the significant role of information graphs, and we present several applications and relevant research directions.
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Mottin, Davide</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Palpanas, Themis</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Papadimitriou, Dimitra</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Velegrakis, Yannis</span>.
</span>
<br>
“<span itemprop="headline name">Unleashing the Power of Information Graphs</span>.”
<br>
<div class="hidden">
<time datetime="2014-01-27" itemprop="datePublished">December, 2014</time>
<span itemprop="image">https://people.cs.aau.dk/~matteo/images/xq-logo.png</span>
</div>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">ACM SIGMOD Record</span></em>
</span>, <span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/PublicationVolume">
<span itemprop="volumeNumber">43</span>
</span>, <span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/PublicationIssue">
(<span itemprop="issueNumber">4</span>)
(<time datetime="2014-12-01" itemprop="datePublished">December, 2014</time>):
</span>
<span itemprop="pageStart">21</span>-<span itemprop="pageEnd">26</span>.
</blockquote>
<pre><code class="lang-bibtex">
@article{Lissandrini:2015:UPI:2737817.2737822,
author = {Lissandrini, Matteo and Mottin, Davide and Palpanas, Themis and Papadimitriou, Dimitra and Velegrakis, Yannis},
title = {Unleashing the Power of Information Graphs},
journal = {ACM SIGMOD Record},
issue_date = {December 2014},
volume = {43},
number = {4},
month = feb,
year = {2015},
issn = {0163-5808},
pages = {21--26},
numpages = {6},
url = {http://doi.acm.org/10.1145/2737817.2737822},
doi = {10.1145/2737817.2737822},
acmid = {2737822},
publisher = {ACM},
address = {New York, NY, USA},
}
</code></pre>
Web Search: From The Noun to The Verb. Takeways from Prabhakar Raghavan's keynote at ISWC'142014-10-28T00:00:00Zhttps://people.cs.aau.dk/notes/iswc-2014-keynote.html<!-- 10-2023 -->
<section class="intro secsubhead">
<p>
I had the chance to participate to this year
<a href="https://iswc2014.semanticweb.org/">International Semantic Web Conference (ISWC2014)</a>
in Riva del Garda and to assist to the
<a href="https://iswc2014.semanticweb.org/keynote_prabhakar_raghavan.html">keynote</a>
given by
<a href="https://research.google.com/pubs/PrabhakarRaghavan.html">Prabhakar Raghavan</a>, Vice President of Engineering at Google.
</p>
<p>
It was a really interesting talk, full of content, but also enjoyable and very approachable by non technical people too.
These are my personal takeaways from his keynote.
I highly reccommend you watch
<a href="https://videolectures.net/iswc2014_raghavan_web_search/">the keynote video that has been published online</a>.
</p>
</section>
<figure class="body-figure">
<img src="https://people.cs.aau.dk/~matteo/images/ISWC-logo.png" alt="ISWC 2014 Logo">
<figcaption>
<a href="https://iswc2014.semanticweb.org/">ISWC 2014</a> in Riva del Garda (Italy)
</figcaption>
</figure>
<p>Prabhakar presented a nice overview of the evolution in the field of search engines in the last 20 years.
He started from the first innovations like Continuous Crawling from 1995, to the introduction of Recall as a measure of performance quality.
He went through the problem of ranking.
Starting with the
"<a href="https://en.wikipedia.org/wiki/Inktomi">Inktomi</a> scoring function", the concept of
<a href="https://en.wikipedia.org/wiki/PageRank">Page rank</a>
where the link to a page is a signal of endorsement or, altenatively, a description of the content.
Thus highlighting how fundamental it was, and is, to evaluate the "importance" of a page not only based on his content.</p>
<figure class="body-figure">
<img src="https://upload.wikimedia.org/wikipedia/en/thumb/8/8b/PageRanks-Example.jpg/581px-PageRanks-Example.jpg" alt="An example of Page Rank">
<figcaption>
<a href="https://en.wikipedia.org/wiki/PageRank">PageRanks</a>
computed for a simple network
</figcaption>
</figure>
<p>Then he moved on trying to answer why people are running "queries".
From the search for finding a URL (navigational queries), to the search for knowledge (informational queries), to transactional queries.
Those are queries a user perform in order to find a product to buy.
Additionally he highlighted that approximately <code>99%</code> of queries contain entities nowadays.
One outstading example is what he called the "party factoid", i.e., Wikipedia fact checking during parties.</p>
<p>Then he explained that now there are queries for fine grained, every-day-life tasks issued trough mobile devices.
So understanding a query is highly dependent on the context.
This is what he called "understanding the verb" implicit in the query.
The example: you look for "restaurant", but:</p>
<ul>
<li>to <em>select</em> a good restaurant and get infos</li>
<li>to <em>book</em> a restaurant you selected</li>
<li>to find the way to <em>go to</em> the restaurant</li>
</ul>
<p>Then he made a important remark about the fact that in the past 15 years a query was made of 2.5 words on average, and now things are going to change with the advent of natural language questions (e.g., <code>ask Siri</code>).</p>
<p>In the end, for what I've understood, his vision of the future will be about three main keypoints:</p>
<ul>
<li>understanding the context: the user intent given by the context</li>
<li>dealing with more complex queries (natural language questions)</li>
<li>forecasting the next query before the user issues it ( Google now )</li>
</ul>
<p>A nice remark about the last point: if we can predict the next query, then we can perform it when is more convenient for the servers, i.e., when load is lower.</p>
Extracting Link Titles from Wikipedia pages2014-10-21T00:00:00Zhttps://people.cs.aau.dk/notes/wikipedia-titles-extractor.html<!-- 10-2023 -->
<section class="intro secsubhead">
<p>
I wanted to extract all the text from links in a wikipedia article.
Expecially for important pages this list can be extremely big, for this reason I wrote a couple of lines of javascript that can extract exactly that.
</p>
</section>
<p>The script identfies all links in the body with a title and all the link in the table of content (TOC).
From those links will remove those that contains a colon <code>:</code> that are service URLs.
The titles collected - that are longer than 2 letters - are then kept in the <code>titles</code> variable.
Notice that this is actually jQuery syntax, and we can use this because wikipedia already uses it.</p>
<pre><code class="language-js">var titles = [];
$('#bodyContent a[title], .toctext')
.not('[title*=":"]')
.each(function(){
if($(this).text().length > 2) {
titles.push($(this).text())
}
});
</code></pre>
<p>Additionally we can also retrieve titles from footnotes in the page, we assume that the important part is between quotes, as usually these are from titles of books, articles and so on.
Thus we use a js regular expression to extract only the bit of text that is in the title of a footnote surrounded by quotes.</p>
<pre><code class="language-js">var rexp = new RegExp("\\\"(.+)\\\"", "g");
$('.external.text')
.each(function(){
var title = $(this).text().replace(rexp,"$1");
if(title.length > 2) {
titles.push(title)
}
});
</code></pre>
<p>Since we are there handling variables, we can also sort the list of titles.</p>
<pre><code class="language-js"> titles.sort();
</code></pre>
<p>Now the variable is somewhere inside the browser, but we want an actionable deliverable, a file.
To <em>transmute</em> a js variable into a file we can use the following code from <a href="https://github.com/bgrins/devtools-snippets/blob/master/snippets/console-save/console-save.js">here</a>:</p>
<pre><code class="language-js">(function(console){
console.save = function(data, filename){
if(!data){
console.error('Console.save: No data')
return;
}
if(!filename) {
filename ='console.json'
}
if(typeof data ==="object"){
data = JSON.stringify(data,undefined,4)
}
var blob =newBlob([data],{type:'text/json'}),
e = document.createEvent('MouseEvents'),
a = document.createElement('a');
a.download = filename ;
a.href = window.URL.createObjectURL(blob) ;
a.dataset.downloadurl = ['text/json', a.download, a.href].join(':') ;
e.initMouseEvent('click',true,false, window,0,0,0,0,0,false,false,false,false,0,null) ;
a.dispatchEvent(e);
}
})(console)
</code></pre>
<p>At this point the following line of javascript will -magically- ask your browser to download a <code>.json</code> file containing your variable</p>
<pre><code class="language-js">console.save( titles, "wikipedia-titles-filename.json" )
</code></pre>
<p>In order to automatically save a file with a <strong>meaningful</strong> filename we can use something like the current page <code>URL</code></p>
<pre><code class="language-js"> var fileName = (location.href.split('/').reverse()[0]) = '-titles.json';
console.save(titles, fileName);
</code></pre>
<p>Now we can concate all those into a <a href="https://people.cs.aau.dk/~matteo/files/wikipedia-extractor-complete.js">single js file</a> and use a link to the external file to create the code for a <a href="https://code.tutsplus.com/tutorials/create-bookmarklets-the-right-way--net-18154">bookmarklet</a> like this:</p>
<pre><code class="language-js">javascript: (function(){
var titlesExtractorUrl = 'https://people.cs.aau.dk/~matteo/files/wikipedia-extractor-complete.js';
var jsCodeExtract = document.createElement('script');
jsCodeExtract.setAttribute('src', titlesExtractorUrl);
document.body.appendChild(jsCodeExtract);
}());
</code></pre>
<p>You can then compress it with <a href="https://jscompress.com/">https://jscompress.com/</a> or <a href="https://javascript-minifier.com/">https://javascript-minifier.com/</a> but <em>mind the gap!</em> in the compressed files they are using double quotes <code>"</code> to delimit strings, I suggest you replace them with single quotes when you transform it into a bookmarklet to be put in the <code>href</code> attribute for a link like this one below.</p>
<p>Here is the <a class="bookmarklet" href="javascript: (function(){var e='https://people.cs.aau.dk/~matteo/files/wikipedia-extractor-complete.js';var t=document.createElement('script');t.setAttribute('src',e);document.body.appendChild(t)})()">Wikipedia link Title Extractor</a> bookmarklet. <em>Drag this link to your bookmark bar!</em></p>
“Changing Minds” — An Excerpt2014-09-04T00:00:00Zhttps://people.cs.aau.dk/notes/excerpt-changing-minds.html<!-- 10-2023 -->
<blockquote class="important">
<p>Every good new system enlarges the set of ways we can think about the world.
If we happen to have in hand a system that is apt for learning or inquiring into a new area, we make progress quickly.
</p>
<footer>
—
<cite>
<a href="https://mitpress.mit.edu/books/changing-minds">“Changing Minds”</a>
by Andy di Sessa
</cite>
</footer>
</blockquote>
<section class="intro secsubhead">
<p>
<a href="https://worrydream.com/">Bret Victor</a>
has
<a href="http://worrydream.com/#!/Bio">the goal</a>
to “revolutionize how people, learn, understands and create”.
There is something equally inspiring in this particular
<a href="http://worrydream.com/oatmeal/changing-minds.jpg">excerpt he shared</a>.
</p>
</section>
<p class="after-heading">
In light of this, while doing research, reading papers or pursuing the wild path towards knowledge, I think is important to ask ourselves whether we are using the good tools, or even better, if we are actually providing the world with better means to foster the progress and science.
</p>
Installing HDFS and Hadoop 2.X on a Multi-node cluster with Ubuntu 14.04 2014-08-26T00:00:00Zhttps://people.cs.aau.dk/notes/installing-hadoop-on-ubuntu-14.html<!-- 10-2023 -->
<section class="intro secsubhead">
<p>
This guide is shows step by step how to set up a <em>multi node cluster</em> with <strong>Hadoop and HDFS 2.4.1 on Ubuntu 14.04</strong>.
It is an update, and takes many parts from previous guides about installing
<a href="https://hadoop.apache.org/docs/stable/">Hadoop&HDFS</a>
versions
<a href="https://n0where.net/hadoop-2-2-multi-node-cluster-setup/">2.2</a>
and
<a href="https://www.elcct.com/installing-hadoop-2-3-0-on-ubuntu-13-10/">2.3</a>
on Ubuntu.
</p>
</section>
<p>The text here is quite lengthy, I will soon provide a script to auomate some parts.</p>
<p>Assume we have a 3 nodes cluster, my test case is the following (with IP addresses and shortnames) :</p>
<pre><code class="language-bash">10.10.10.104 mynode1
10.10.10.105 mynode2
10.10.10.106 mynode3
</code></pre>
<p><strong>Note:</strong> We assume the nodes in the cluster <strong>have the same hardware configuation</strong>, i.e., the same type of architecture.</p>
<h2 id="setup" tabindex="-1">Setup</h2>
<p>Make sure you have Oracle JDK 7 or 8 installed.
<em>The following are the commands for java 8, to install java 7 you just need to change the version number</em></p>
<pre><code class="language-bash">sudo add-apt-repository ppa:webupd8team/java -y
sudo apt-get update
sudo apt-get install oracle-java8-installer
sudo apt-get install oracle-java8-set-default
</code></pre>
<p><em>Note:</em> I know some of you are trying to run this guide with debian. I am not sure how much of these guide will apply to that OS, but for this specific case, for debian,
<a href="https://www.webupd8.org/2014/03/how-to-install-oracle-java-8-in-debian.html">the instructions to install Java 8 are here</a>.</p>
<p>While we are installing software, you can find useful to install also <code>screen</code> to start sessions of work on remote servers, and <code>nmap</code> to check server ports in case something is not working in the cluster networking</p>
<pre><code class="language-bash">sudo apt-get install screen nmap
</code></pre>
<p><strong>Repeat this installation procedure, up to this point, on every node you have in the cluster</strong></p>
<p><strong>The following will be necessary only on the first node:</strong>
Then we start a screen to work remotely without fear of losing work if disconnected.</p>
<pre><code class="language-bash">screen -S installing
</code></pre>
<p><em>After the <code>-S</code> you can put whatever name for your sessions</em></p>
<p>Now we are going to actually install the software needed with <code>maven</code> with libraries to compile hdfs&hadoop.</p>
<pre><code class="language-bash">sudo apt-get install maven build-essential zlib1g-dev cmake pkg-config libssl-dev protobuf-compiler
</code></pre>
<p>Among these files, <code>protoc</code> or also called <code>protobuf-compiler</code> may cause some problems with the version depending on your operating system version.
In that case, you can
<a href="https://github.com/y12studio/y12hadoop/blob/master/install-hadoop-2.4.0.md#proto-buffer-250">compile and install the correct version (<code>2.5.0</code>) from the source</a>.</p>
<h2 id="hadoop-user-%26-authentication" tabindex="-1">Hadoop User & Authentication</h2>
<p>Next, let's create <code>hadoop</code> group and the user <code>hduser</code>, which will be also in the sudoers, the following commands have to be run one at at time.
In the second step the <code>adduser</code> will also ask the login password for <code>hduser</code>:</p>
<pre><code class="language-bash">sudo addgroup hadoop
sudo adduser --ingroup hadoop hduser
sudo adduser hduser sudo
</code></pre>
<p><strong>Repeat this procedure, up to this point, on every node you have in the cluster</strong></p>
<p>We now log in as the new <code>hduser</code> on one node and we will create SSH keys to access the other servers:</p>
<pre><code class="language-bash">sudo su - hduser
</code></pre>
<p><strong>From now on, in the rest of this guide, all commands will be run as the <code>hduser</code>.</strong></p>
<pre><code class="language-bash">ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
</code></pre>
<p>Now let's copy these files on the other nodes, e.g, from <code>mynode1</code> to <code>mynode2</code> and <code>mynode3</code></p>
<pre><code class="language-bash">scp -r ~/.ssh hduser@10.10.10.106:~/
</code></pre>
<h2 id="compile-the-sources" tabindex="-1">Compile the Sources</h2>
<p>The following steps will be needed only once.
Download hadoop <code>2.X</code> stable, to do so you navigate in the
<a href="https://www.apache.org/dyn/closer.cgi/hadoop/core">List of Mirrors</a>
select one and decide what version to download.
With <code>wget</code> you can run something like the following for hadoop <code>2.4.1</code> - from europe:</p>
<pre><code class="language-bash">wget https://www.eu.apache.org/dist/hadoop/core/hadoop-2.4.1/hadoop-2.4.1-src.tar.gz
</code></pre>
<p>From the U.S. instead</p>
<pre><code class="language-bash">wget https://apache.mirror.anlx.net/hadoop/core/hadoop-2.4.1/hadoop-2.4.1-src.tar.gz
</code></pre>
<p>Once it has been downloaded, unpack it</p>
<pre><code class="language-bash">tar -xvf hadoop-2.4.1-src.tar.gz
</code></pre>
<p>Then enter the directory and compile</p>
<pre><code class="language-bash">cd hadoop-2.4.1-src/
mvn package -Pdist,native -Dmaven.javadoc.skip=true -DskipTests -Dtar
</code></pre>
<p>Notice that, if you are behind a proxy,
<a href="https://maven.apache.org/settings.html">maven needs a <code>settings.xml</code></a>
file in the configuration directory in <code>~/.m2</code> that contains
<a href="https://maven.apache.org/guides/mini/guide-proxies.html">the basic information of your proxy configuration</a>.</p>
<p>Compiled files will be found in <code>hadoop-dist/target/hadoop-2.4.1.tar.gz</code> just put them in the home</p>
<pre><code class="language-bash">mv hadoop-dist/target/hadoop-2.4.1.tar.gz ~/
</code></pre>
<p>Now let's copy these files on the other nodes, e.g, from <code>mynode1</code> to <code>mynode2</code> and <code>mynode3</code></p>
<pre><code class="language-bash">scp ~/hadoop-2.4.1.tar.gz hduser@10.10.10.105:~/
scp ~/hadoop-2.4.1.tar.gz hduser@10.10.10.106:~/
</code></pre>
<h2 id="install-the-compiled-code" tabindex="-1">Install the Compiled Code</h2>
<p><strong>The following steps will be needed on all the machines</strong>
We unpack the compiled version and put it in <code>/usr/local</code> and we create a shortcut called <code>/usr/local/hadoop</code></p>
<pre><code class="language-bash">sudo tar -xvf ~/hadoop-2.4.1.tar.gz -C /usr/local/
sudo ln -s /usr/local/hadoop-2.4.1 /usr/local/hadoop
sudo chown -R hduser:hadoop /usr/local/hadoop-2.4.1
</code></pre>
<h2 id="set-up-env-variables" tabindex="-1">Set up ENV Variables</h2>
<p><strong>The following steps will be needed on all the machines</strong>
We update the profile of the shell, i.e., we edit the <code>.profile</code> file to put some enviroment variables, in order to upset equally <code>vim</code> and <code>emacs</code> user we will use a text editor called <code>nano</code></p>
<pre><code class="language-bash">nano ~/.profile
</code></pre>
<p>And we add, at the end, the following</p>
<pre><code class="language-bash">export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
export HADOOP_INSTALL=/usr/local/hadoop
export HADOOP_HOME=$HADOOP_INSTALL
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export HADOOP_CONF_DIR=${HADOOP_HOME}"/etc/hadoop"
export YARN_HOME=$HADOOP_INSTALL
alias hfs="hdfs dfs"
</code></pre>
<p>(To save <code>CTRL+o</code> <code>ENTER</code> and <code>CTRL+x</code> )</p>
<p><em>Note:</em> If you installed somewhere else hadoop, check the proper directory path for <code>$HADOOP_INSTALL</code>, but do not change <code>$HADOOP_CONF_DIR</code>.</p>
<p>Now we made the edit operative by reloading the <code>.profile</code> file with</p>
<pre><code class="language-bash">source ~/.profile
</code></pre>
<p>We also have to edit <code>hadoop-env.sh</code> files with for the same <code>$JAVA_HOME</code> variable, that they seem not able to set up properly, so we open the file in</p>
<pre><code class="language-bash">nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh
</code></pre>
<p>and around line 27 we can replace</p>
<pre><code class="language-bash">export JAVA_HOME=${JAVA_HOME}
</code></pre>
<p>with</p>
<pre><code class="language-bash">export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
</code></pre>
<p>If you want to be sure it worked, you can paste some values, like</p>
<pre><code class="language-bash">echo $JAVA_HOME
echo $HADOOP_HOME
</code></pre>
<h2 id="set-up-data-directory-%26-logs" tabindex="-1">Set up Data Directory & Logs</h2>
<p>We create the directory where <code>hdfs</code> data files and logs will be stored, you can create them wherever you like</p>
<p>The first directory is actually needed only on the NameNode (main) machine</p>
<pre><code class="language-bash">mkdir -pv /usr/local/hadoop/data/namenode
</code></pre>
<p><strong>These steps will be needed on all the machines</strong></p>
<pre><code class="language-bash">mkdir -pv /usr/local/hadoop/data/datanode
mkdir -pv $HADOOP_INSTALL/logs
</code></pre>
<h2 id="edit-configuration-files" tabindex="-1">Edit Configuration Files</h2>
<p><strong>These steps will be needed only on the main machine, then we will copy the entire conf directory on the other machines</strong></p>
<p>Then we put this information in the <code>hdfs-site.xml</code> file with</p>
<pre><code class="language-bash">nano $HADOOP_INSTALL/etc/hadoop/hdfs-site.xml
</code></pre>
<p>And paste the following between <code><configuration></code> tag:</p>
<pre><code class="language-xml"><property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/hadoop/data/datanode</value>
<description>DataNode directory</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/hadoop/data/namenode</value>
<description>NameNode directory for namespace and transaction logs storage.</description>
</property>
</code></pre>
<p>The following are additional configuration parameters to put alongside the previous ones, among them the replication parameter to match the number redundant copy we want - it does not necessarily match the number of nodes in the cluster.</p>
<pre><code class="language-xml"><property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
</code></pre>
<p><strong>Notice:</strong> when you will start your HDFS distributed filesystem, you will have a main <code>NameNode</code> and a <code>Secondary NameNode</code>.
The <code>Secondary NameNode</code> is
<em>
<a href="https://wiki.apache.org/hadoop/FAQ#What_is_the_purpose_of_the_secondary_name-node.3F">not what you think</a>
<a href="https://www.youtube.com/watch?v=hEqQMLSXQlY">it is</a></em>.</p>
<blockquote class="important">
<p>The term "secondary name-node" is somewhat misleading. It is not a name-node in the sense that data-nodes cannot connect to the secondary name-node, and in no event it can replace the primary name-node in case of its failure.
</p>
<footer>
—
<cite>
<a href="https://wiki.apache.org/hadoop/FAQ#What_is_the_purpose_of_the_secondary_name-node.3F">From Hadoop FAQ</a>
by Andy di Sessa
</cite>
</footer>
</blockquote>
<p>In any case you may want to put the secondary name node on a different machine that is not the master, but maybe one of the workers.
Assume you decide your cluster main node is</p>
<pre><code class="language-bash">10.10.10.104 mynode1
</code></pre>
<p>and assume you decide your cluster to have the Secondary NameNode on</p>
<pre><code class="language-bash">10.10.10.105 mynode2
</code></pre>
<p>then we add the following to the <code>hdfs-site.xml</code> file :</p>
<pre><code class="language-xml"><property>
<name>dfs.namenode.http-address</name>
<value>10.10.10.104:50070</value>
<description>Your NameNode hostname for http access.</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>10.10.10.105:50090</value>
<description>Your Secondary NameNode hostname for http access.</description>
</property>
</code></pre>
<p><em>I thank my colleague
<a href="https://members.loria.fr/SAridhi/">Sabeur</a>
for helping me with this bit on the Secondary NameNode</em>.</p>
<p>Then we also point to <code>mynode1</code> IP to for the Hadoop cluster to tell where we host the hadoop <code>NameNode</code> by editing:</p>
<pre><code class="language-bash">nano $HADOOP_INSTALL/etc/hadoop/core-site.xml
</code></pre>
<p>and we add inside the <code><configuration></code> tag the following</p>
<pre><code class="language-xml"><property>
<name>fs.defaultFS</name>
<value>hdfs://10.10.10.104/</value>
<description>NameNode URI</description>
</property>
</code></pre>
<p>We put the IP addresses of the nodes to be used as <code>DataNodes</code> in the <code>slaves</code> file, we open it with</p>
<pre><code class="language-bash"> nano $HADOOP_INSTALL/etc/hadoop/slaves
</code></pre>
<p>And we put the list of server addresses one per line, note that in this case also the master is used, so we put there the following list:</p>
<pre><code class="language-bash"> 10.10.10.104
10.10.10.105
10.10.10.106
</code></pre>
<p>Up to here was mainly about <code>HDFS</code>, now we configure the <code>yarn</code> cluster, i.e., the execution engine, we then edit the <code>yarn-site.xml</code>.</p>
<pre><code class="language-bash"> nano $HADOOP_INSTALL/etc/hadoop/yarn-site.xml
</code></pre>
<p>Again we add the following inside the <code><configuration></code> tag</p>
<pre><code class="language-xml"><property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>10.10.10.104:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>10.10.10.104:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>10.10.10.104:8050</value>
</property>
</code></pre>
<p>Now is time to update all the nodes with this news configuration, thus we copy from <code>mynode1</code> to <code>mynode2</code> and <code>mynode3</code> the directory with the following command (<em>note the destination directory</em>)</p>
<pre><code class="language-bash">scp -r $HADOOP_INSTALL/etc/hadoop hduser@10.10.10.105:$HADOOP_INSTALL/etc/
scp -r $HADOOP_INSTALL/etc/hadoop hduser@10.10.10.106:$HADOOP_INSTALL/etc/
</code></pre>
<h2 id="initialize-hdfs" tabindex="-1">Initialize HDFS</h2>
<p><strong>These commands will be used only on the main node</strong></p>
<p>If all went well we should be able to run the following command</p>
<pre><code class="language-bash">hadoop version
</code></pre>
<p>and obtain something like</p>
<pre><code class="language-bash">Hadoop 2.4.1
Subversion Unknown -r Unknown
Compiled by hduser on 2014-08-23T15:29Z
Compiled with protoc 2.5.0
From source with checksum bb7ac0a3c73dc131f4844b873c74b630
This command was run using /usr/local/hadoop-2.4.1/share/hadoop/common/hadoop-common-2.4.1.jar
</code></pre>
<p>Now the first step is to <code>format</code> the NameNode, this will basically initialize the <code>hdfs</code> file system. So on the main node you run:</p>
<pre><code class="language-bash">hdfs namenode -format
</code></pre>
<blockquote>
<p>Hadoop NameNode is the centralized place of an HDFS file system which keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. In short, it keeps the metadata related to datanodes. When we format namenode it formats the meta-data related to data-nodes. – <a href="https://stackoverflow.com/a/18873340">From StackOverflow</a></p>
</blockquote>
<h2 id="start-and-test-the-cluster!" tabindex="-1">Start and test the Cluster!</h2>
<p><strong>These commands will be used only on the main node</strong>
Now we can start the <code>hdfs</code> cluster with the command</p>
<pre><code class="language-bash">start-dfs.sh
</code></pre>
<p>And if the preivious command didn't complain about anythign, we can create a random directory in our <code>HDFS</code> filesystem with</p>
<pre><code class="language-bash">hadoop fs -mkdir -p /datastore
</code></pre>
<p><em>Note that we used the full <code>hadoop fs</code> command, but in our profile we added an alias with <code>hfs</code></em>.</p>
<p>Now check the size of the files inside the <code>datanode</code> directory</p>
<pre><code class="language-bash">du -sh /usr/local/hadoop/data/datanode
</code></pre>
<p>and we can put inside a new directory and, as a test, the <code>.tar.gz</code> file of hadoop</p>
<pre><code class="language-bash">hfs -mkdir -p /datastore/test
hfs -copyFromLocal ~/hadoop-2.4.1.tar.gz /datastore/
</code></pre>
<p>now check again the size of the files inside the <code>datanode</code> directory, you can run the same command on all nodes, and see that the file is also on those other servers (<em>all of it or part, it depends on the replication level and the number of nodes you have</em>)</p>
<pre><code class="language-bash">du -sh /usr/local/hadoop/data/datanode
</code></pre>
<p>You can check the content of the <code>hdfs</code> directory with</p>
<pre><code class="language-bash">hfs -ls /datastore
</code></pre>
<p>and remove the all the files with</p>
<pre><code class="language-bash">hfs -rm /datastore/test/*
</code></pre>
<p>In case you want to delete an entire directory you can instead use</p>
<pre><code class="language-bash">hfs -rm -r /datastore/test
</code></pre>
<p>This is the distributed file system running, and you can check the processes with</p>
<pre><code class="language-bash">jps
</code></pre>
<p>Which will give you, on the main node, something like</p>
<pre><code class="language-bash">18755 DataNode
18630 NameNode
18969 SecondaryNameNode
19387 Jps
</code></pre>
<p>Up to here we set up the distributed filesystem, this will be come handy not only for <code>hadoop</code>, but also for other distributed computation engines, like <a href="https://spark.apache.org/docs/latest/spark-standalone.html">Spark</a> or <a href="https://flink.incubator.apache.org/">Flink</a> - which was <a href="https://stratosphere.eu/">Stratosphere</a>.</p>
<p>Finally to start the actual <code>hadoop</code> <code>yarn</code> execution engine you just go with</p>
<pre><code class="language-bash">start-yarn.sh
</code></pre>
<h2 id="configure-hostnames" tabindex="-1">Configure Hostnames</h2>
<p>As a side note, in this guide, we used IP addresses in configuration files, if you want to use instead the shortnames you shall first update the <code>/etc/hosts</code> so that all of them are listed with their shortname.</p>
<pre><code class="language-bash">10.10.10.104 mynode1
10.10.10.105 mynode2
10.10.10.106 mynode3
</code></pre>
<p>In this case, make sure that there, the only appeareance of the ip <code>127.0.0.1</code> is with <code>localhost</code>.
This is very important, so if in you <code>hosts</code> file there is a line like</p>
<pre><code class="language-bash">127.0.0.1 mynode1
</code></pre>
<p><strong>delete it!</strong></p>
Searching with XQ: the eXemplar Query Search Engine2014-06-22T00:00:00Zhttps://people.cs.aau.dk/publications/demo/2014-sigmod-exq.html<h1 id="searching-with-xq%3A-the-exemplar-query-search-engine" tabindex="-1">Searching with XQ: the eXemplar Query Search Engine</h1>
<h2 id="davide-mottin%2C-matteo-lissandrini%2C-themis-palpanas%2C-yannis-velegrakis" tabindex="-1">Davide Mottin, Matteo Lissandrini, Themis Palpanas, Yannis Velegrakis</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/SIGMOD14-xq.pdf">PDF</a>)
</p><p>
The final authenticated version is available online at <a href="https://doi.org/10.1145/2588555.2594529">10.1145/2588555.2594529</a>.
</p></section>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
We demonstrate <abbr title="eXemplar Query">XQ</abbr>, a query engine that implements a novel technique for searching relevant information on the web and in various data sources, called <em>Exemplar Queries</em>.
While the traditional query model expects the user to provide a set of specifications that the elements of interest need to satisfy, <abbr>XQ</abbr> expects the user to provide only an element of interest and we infer the desired answer set based on that element.
Through the various examples we demonstrate the functionality of the system and its applicability in various cases.
At the same time, we highlight the technical challenges for this type of query answering and illustrate the implementation approach we have materialized.
The demo is intended for both researchers and practitioners and aims at illustrating the benefits of the adoption of this new form of query answering in practical applications and the further study and advancement of its technical solutions.
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Mottin, Davide</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Velegrakis, Yannis</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Palpanas, Themis</span>.
</span>
<br>
“<span itemprop="headline name">Searching with XQ: The Exemplar Query Search Engine</span>.”
<br>
<div class="hidden">
<time datetime="2014-06-22" itemprop="datePublished">June, 2014</time>
</div>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data</span></em>
</span> (pp. <span itemprop="pageStart">901</span>-<span itemprop="pageEnd">904</span>).
</blockquote>
<pre><code class="lang-bibtex">
@inproceedings{Mottin:2014:SXE:2588555.2594529,
author = {Mottin, Davide and Lissandrini, Matteo and Velegrakis, Yannis and Palpanas, Themis},
title = {Searching with XQ: The Exemplar Query Search Engine},
booktitle = {Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data},
series = {SIGMOD '14},
year = {2014},
isbn = {978-1-4503-2376-5},
location = {Snowbird, Utah, USA},
pages = {901--904},
numpages = {4},
url = {http://doi.acm.org/10.1145/2588555.2594529},
doi = {10.1145/2588555.2594529},
acmid = {2594529},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {exemplar queries, labeled graphs, query paradigms},
}
</code></pre>
Exemplar Queries: Give me an Example of What You Need2014-01-01T00:00:00Zhttps://people.cs.aau.dk/publications/journal/2014-vldb-exemplar.html<h1 id="exemplar-queries%3A-give-me-an-example-of-what-you-need" tabindex="-1">Exemplar Queries: Give me an Example of What You Need</h1>
<h2 id="davide-mottin%2C-matteo-lissandrini%2C-themis-palpanas%2C-yannis-velegrakis" tabindex="-1">Davide Mottin, Matteo Lissandrini, Themis Palpanas, Yannis Velegrakis</h2>
<section class="links">
<p>
Download a digital copy (<a class="attachment" href="https://people.cs.aau.dk/~matteo/pdf/VLDB14-exemplarqueries.pdf">PDF</a>)
</p><p>
The final authenticated version is available online at <a href="https://doi.org/10.14778/2732269.2732273">10.14778/2732269.2732273</a>.
</p></section>
<h3 id="abstract%3A" tabindex="-1">Abstract:</h3>
<section class="intro secsubead">
Search engines are continuously employing advanced techniques that aim to capture user intentions and provide results that go beyond the data that simply satisfy the query conditions.
Examples include the personalized results, related searches, similarity search, popular and relaxed queries.
In this work we introduce a novel query paradigm that considers a user query as an example of the data in which the user is interested.
We call these queries <em>exemplar queries</em> and claim that they can play an important role in dealing with the information deluge.
We provide a formal specification of the semantics of such queries and show that they are fundamentally different from notions like queries by example, approximate and related queries.
We provide an implementation of these semantics for graph-based data and present an exact solution with a number of optimizations that improve performance without compromising the quality of the answers.
We also provide an approximate solution that prunes the search space and achieves considerably better time-performance with minimal or no impact on effectiveness.
We experimentally evaluate the effectiveness and efficiency of these solutions with synthetic and real datasets, and illustrate the usefulness of exemplar queries in practice.
</section>
<h3 id="cite%3A" tabindex="-1">Cite:</h3>
<blockquote itemscope="" itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Mottin, Davide</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Lissandrini, Matteo</span>;
</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Velegrakis, Yannis</span>;
</span>
and
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<span itemprop="name">Palpanas, Themis</span>.
</span>
<br>
“<span itemprop="headline name">Exemplar Queries: Give Me an Example of What You Need</span>.”
<br>
<div class="hidden">
<time datetime="2014-01-27" itemprop="datePublished">January, 2014</time>
<span itemprop="image">https://people.cs.aau.dk/~matteo/images/xq-logo.png</span>
</div>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/Periodical">
<em><span itemprop="name">Proceedings of the VLDB Endowment</span></em>
</span>
<span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/PublicationVolume">
<span itemprop="volumeNumber">7</span>
</span>, <span itemprop="isPartOf" itemscope="" itemtype="http://schema.org/PublicationIssue">
(<span itemprop="issueNumber">5</span>)
(<time datetime="2014-01-01" itemprop="datePublished">January, 2014</time>):
</span>
<span itemprop="pageStart">365</span>-<span itemprop="pageEnd">376</span>.
</blockquote>
<pre><code class="lang-bibtex">
@article{Mottin:2014:EQG:2732269.2732273,
author = {Mottin, Davide and Lissandrini, Matteo and Velegrakis, Yannis and Palpanas, Themis},
title = {Exemplar Queries: Give Me an Example of What You Need},
journal = {Proceedings of the VLDB Endowment},
issue_date = {January 2014},
volume = {7},
number = {5},
month = jan,
year = {2014},
issn = {2150-8097},
pages = {365--376},
numpages = {12},
url = {http://dx.doi.org/10.14778/2732269.2732273},
doi = {10.14778/2732269.2732273},
acmid = {2732273},
publisher = {VLDB Endowment},
}
</code></pre>
Processing a hybrid flow associated with a service class2013-09-27T00:00:00Zhttps://people.cs.aau.dk/publications/patent/2013-hp-hybrid-flow.html<h1 id="processing-a-hybrid-flow-associated-with-a-service-class" tabindex="-1">Processing a hybrid flow associated with a service class</h1>
<h2 id="alkiviadis-simitsis%2C-william-kevin-wilkinson%2C-matteo-lissandrini---original-assignee-hewlett-packard-development-company%2C-l.p." tabindex="-1">Alkiviadis Simitsis, William Kevin Wilkinson, Matteo Lissandrini - Original Assignee Hewlett-Packard Development Company, L.P.</h2>
<h3 id="abstract-%3A" tabindex="-1">Abstract :</h3>
<section class="intro secsubead">
Described herein are techniques for processing hybrid flows. A hybrid flow may be associated with one of a plurality of service classes and may include sub-flows directed to multiple execution environments. An execution schedule may be generated based on an objective associated with the service class and resource allocation or availability. An action may be taken according to a policy associated with the service class if execution of the hybrid flow fails to meet the objective.
</section>