Advances in Large-Scale RDF Data Management

Boncz, Peter; Erling, Orri; Pham, Minh Tu

doi:10.1007/978-3-319-09846-3_2

Cited by 13 publications

(11 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Regardless of the underneath model (based on a relational schema, implementing a native index or a NOSQL solution), RDF stores often speed up quad-based queries by indexing different combinations of the subject, predicate, object and graph elements in RDF [13]. Virtuoso [5] implements quads in a column-based relational store, with two full indexes over the RDF quads, with PSOG and POSG order, and 3 projections SP, OP and GS. The well-known Apache Jena TDB 5 stores RDF datasets using 6 B+Trees indexes, namely SPOG, POSG, OSPG, GSPO, GPOS and GOSP.…”

Section: State Of the Artmentioning

confidence: 99%

HDTQ: Managing RDF Datasets in Compressed Space

Fernández

Martínez‐Prieto

Polleres

et al. 2018

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. HDT (Header-Dictionary-Triples) is a compressed representation of RDF data that supports retrieval features without prior decompression. Yet, RDF datasets often contain additional graph information, such as the origin, version or validity time of a triple. Traditional HDT is not capable of handling this additional parameter(s). This work introduces HDTQ (HDT Quads), an extension of HDT that is able to represent quadruples (or quads) while still being highly compact and queryable. Two HDTQ-based approaches are introduced: Annotated Triples and Annotated Graphs, and their performance is compared to the leading opensource RDF stores on the market. Results show that HDTQ achieves the best compression rates and is a competitive alternative to well-established systems.

show abstract

Section: State Of the Artmentioning

confidence: 99%

HDTQ: Managing RDF Datasets in Compressed Space

Fernández

Martínez‐Prieto

Polleres

et al. 2018

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…If the query optimizer's decision about whether to break up a pipeline is correct (which is non-trivial [24]), Peloton can be faster than both standard models. [27] push interpretation 2001 MonetDB [9] n/a vectorization 1996 VectorWise [7] pull vectorization 2005 Virtuoso [8] push vectorization 2013 Hique [18] n/a compilation 2010 HyPer [28] push compilation 2011 Hekaton [12] pull compilation 2014…”

Section: Hybrid Modelsmentioning

confidence: 99%

“…One advantage of the push model is that it enables DAG-structured query plans (as opposed to trees), i.e., an operator may push its output to more than one consumer [27]. Push-based execution also has advantages in distributed query processing with Exchange operators, which is one of the reasons it has been adopted by Virtuoso [8]. One downside of the push model is that it is slightly less flexible in terms of control flow: A merge-sort, for example, has to fully materialize one input relation.…”

Section: Other Query Processing Modelsmentioning

confidence: 99%

Everything you always wanted to know about compiled and vectorized queries but were afraid to ask

KerstenTimo¹,

LeisViktor²,

KemperAlfons³

et al. 2018

Proc. VLDB Endow.

Self Cite

View full text Add to dashboard Cite

The query engines of most modern database systems are either based on vectorization or data-centric code generation. These two state-of-the-art query processing paradigms are fundamentally different in terms of system structure and query execution code. Both paradigms were used to build fast systems. However, until today it is not clear which paradigm yields faster query execution, as many implementation-specific choices obstruct a direct comparison of architectures. In this paper, we experimentally compare the two models by implementing both within the same test system. This allows us to use for both models the same query processing algorithms, the same data structures, and the same parallelization framework to ultimately create an apples-to-apples comparison. We find that both are efficient, but have different strengths and weaknesses. Vectorization is better at hiding cache miss latency, whereas data-centric compilation requires fewer CPU instructions, which benefits cacheresident workloads. Besides raw, single-threaded performance, we also investigate SIMD as well as multi-core parallelization and different hardware architectures. Finally, we analyze qualitative differences as a guide for system architects.

show abstract

“…The technology for RDF triple stores is not as mature as for relational databases and this is reflected in their performance as witnessed by the so-called "RDF tax", although recent work has been done to improve this (Boncz et al, 2014). Performance for this prototype was also affected by the quality of the data contained in the database and the type of query performed.…”

Section: Performancementioning

confidence: 99%

Linked Data for Language-Learning Applications

Loughnane¹,

McCurdy²,

Kolb

et al. 2017

Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

View full text Add to dashboard Cite

The use of linked data within languagelearning applications is an open research question. A research prototype is presented that applies linked-data principles to store linguistic annotation generated from language-learning content using a variety of NLP tools. The result is a database that links learning content, linguistic annotation and open-source resources, on top of which a diverse range of tools for language-learning applications can be built.

show abstract

Advances in Large-Scale RDF Data Management

Cited by 13 publications

References 9 publications

HDTQ: Managing RDF Datasets in Compressed Space

HDTQ: Managing RDF Datasets in Compressed Space

Everything you always wanted to know about compiled and vectorized queries but were afraid to ask

Linked Data for Language-Learning Applications

Contact Info

Product

Resources

About