2014
DOI: 10.14778/2732296.2732297
|View full text |Cite
|
Sign up to set email alerts
|

A principled approach to bridging the gap between graph data and their schemas

Abstract: Although RDF graph data often come with an associated schema, recent studies have proven that real RDF data rarely conform to their perceived schemas. Since a number of data management decisions, including storage layouts, indexing, and efficient query processing, use schemas to guide the decision making, it is imperative to have an accurate description of the structuredness of the data at hand (how well the data conform to the schema).In this paper, we have approached the study of the structuredness of an RDF… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(7 citation statements)
references
References 11 publications
0
7
0
Order By: Relevance
“…In [29] graph summaries are computed based on estimating the frequency with which subgraphs match given query patterns. The paper [2] presents an approach where users first define a structuredness function σ that measures how well a given RDF graph fits to the schema and then discover a partitioning of the entities of an RDF graph into subsets which have high structuredness. They consider an optimization variant of the inference problem: finding the lowest number of types for a given threshold on σ or finding a fixed number of types that maximizes σ.…”
Section: Inference Of Shape Graphsmentioning
confidence: 99%
“…In [29] graph summaries are computed based on estimating the frequency with which subgraphs match given query patterns. The paper [2] presents an approach where users first define a structuredness function σ that measures how well a given RDF graph fits to the schema and then discover a partitioning of the entities of an RDF graph into subsets which have high structuredness. They consider an optimization variant of the inference problem: finding the lowest number of types for a given threshold on σ or finding a fixed number of types that maximizes σ.…”
Section: Inference Of Shape Graphsmentioning
confidence: 99%
“…A recent study on the structure refinement for the RDF data, [2] proposed an integer linear programming (ILP)-based algorithm which allows an RDF dataset being partitioned into a number of "sorts" where each sort satisfies a predefined structured-ness fitting threshold. This approach, relying mainly on the similarity and correlation between the properties of sorts, may merge subjects describing unrelated entities but having many common properties into a single sort (as also shown in their experiment with Drug Com-15 http://www.openphacts.org/ pany and Sultan), while our solution only merges related CS's together by exploiting the discriminating properties and the availability of the semantics/ontologies information.…”
Section: Related Workmentioning
confidence: 99%
“…3) Based on the design of RBench, a query generation process is proposed to generate different types of queries systematically for any generated benchmark. 4) Three aspects of RBench are explored in experiments: time and memory complexity of benchmark generation, benchmark datasets evaluation, and query evaluation analysis. We empirically show that benchmark datasets generated by RBench can achieve different scaling factors to fulfil different benchmark generation tasks, consistent with real scaling datasets, and address the limitations of the previous application-specific benchmark generator [8].…”
Section: Problem Definitionmentioning
confidence: 99%
“…Coverage and coherence metrics are introduced [8], as an intuitive way to combine primitive metrics into one single measure of structuredness of RDF datasets. A comprehensive study of the structuredness of RDF graphs is also presented in [4]. A framework is proposed [4] to discover a partitioning of the entities of an RDF graph into subsets which have high structuredness with respect to a specific function chosen by the user.…”
Section: Related Workmentioning
confidence: 99%