Decomposing Federated Queries in Presence of Replicated Fragments

Montoya, Gabriela; Skaf-Molli, Hala; Molli, Pascal; Vidal, María-Esther

doi:10.2139/ssrn.3199272

Cited by 7 publications

(16 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recent examples of such research include the generation of a navigable Graph of Things from live Internet of Things data sources [50] and the use of crowdsourcing to provide real-time transport data in rural areas [51], both topics with parallels to how RIs gather and expose field observations acquired via sensors or human experts. On the topic of distributed query, various languages/frameworks have been proposed such as LDQL [52] and LILAC [53], which can make linked data based search over distributed catalogues more practical than is currently the case by better distributing queries across catalogue nodes with less redundancy and then joining the results efficiently. Such developments reduce the need to aggregate as much metadata in a joint catalogue, however the demands of search (particularly with regard to perceived responsiveness to queries by end-users) make it still generally necessary to cache key metadata in a central store.…”

Section: Linking With Semantic Webmentioning

confidence: 99%

Mapping heterogeneous research infrastructure metadata into a unified catalogue for use in a generic virtual research environment

Martin

Remy

Θεοδωρίδου

et al. 2019

Future Generation Computer Systems

View full text Add to dashboard Cite

Virtual Research Environments (VREs), also known as science gateways or virtual laboratories, assist researchers in data science by integrating tools for data discovery, data retrieval, workflow management and researcher collaboration, often coupled with specific computing infrastructure. Recently, the push for better open data science has led to the creation of a variety of dedicated research infrastructures (RIs) that gather data and provide services to different research communities, all of which can be used independently of any specific VRE. There is therefore a need for generic VREs that can be coupled with the resources of many different RIs simultaneously, easily customised to the needs of specific communities. The resource metadata produced by these RIs rarely all adhere to any one standard or vocabulary however, making it difficult to search and discover resources independently of their providers without some translation into a common framework. Cross-RI search can be expedited by using mapping services that harvest RI-published metadata to build unified resource catalogues, but the development and operation of such services pose a number of challenges. In this paper, we discuss some of these challenges and look specifically at the VRE4EIC Metadata Portal, which uses X3ML mappings to build a single catalogue for describing data products and other resources provided by multiple RIs. The Metadata Portal was built in accordance to the e-VRE Reference Architecture, a microservice-based architecture for generic modular VREs, and uses the CERIF standard to structure its catalogued metadata. We consider the extent to which it addresses the challenges of cross-RI search, particularly in the environmental and earth science domain, and how it can be further augmented, for example to take advantage of linked vocabularies to provide more intelligent semantic search across multiple domains of discourse.

show abstract

Section: Linking With Semantic Webmentioning

confidence: 99%

Mapping heterogeneous research infrastructure metadata into a unified catalogue for use in a generic virtual research environment

Martin

Remy

Θεοδωρίδου

et al. 2019

Future Generation Computer Systems

View full text Add to dashboard Cite

show abstract

“…In order to query the remote RDF stores a global index is required that indicates which RDF stores contain data that are relevant for the query. This index is created by retrieving statistical information that are provided by the remote RDF stores (e.g., SPLENDID [48], WoDQA [8], LHD [134], DAW [120], SemaGrow [26], FEDRA [91], LILAC [92] and Odyssey [90]) or the user (e.g., DARQ [115]), by sending special queries to the remote RDF stores (e.g., FedX [127], ANAPSID [7,93], Lusail [86]) or by observing the results that are returned during the processing of user queries (e.g., ADERIS [83]). Also combinations of these strategies are possible as in Avalanche [18].…”

Section: Federated Rdf Storesmentioning

confidence: 99%

“…If a triple pattern with two constants is requested, the indices described so far could only restrict the number of queried compute nodes by either of the two constants. To restrict the number of queried compute nodes even further, LILAC [92], SemStore [138] additionally count how frequently all subject-property, property-object and subjectobject combinations occur.…”

Section: Centralized Indicesmentioning

confidence: 99%

Storing and Querying Semantic Data in the Cloud

Janke

Staab

2018

Lecture Notes in Computer Science

View full text Add to dashboard Cite

In the last years, huge RDF graphs with trillions of triples were created. To be able to process this huge amount of data, scalable RDF stores are used, in which graph data is distributed over compute and storage nodes for scaling efforts of query processing and memory needs. The main challenges to be investigated for the development of such RDF stores in the cloud are: (i) strategies for data placement over compute and storage nodes, (ii) strategies for distributed query processing, and (iii) strategies for handling failure of compute and storage nodes. In this manuscript, we give an overview of how these challenges are addressed by scalable RDF stores in the cloud. 8 We adapted the definition of an RDF molecule in [38] to allow for paths with a length ≥ 1. 9 The term anchor vertex was taken from [79]. 10 dom(µ) refers to the set of variables of this binding.

show abstract

“…Ulysses uses a replication-aware source selection algorithm to identify which TPF servers can be used to distribute evaluation of triple patterns during SPARQL query processing, based on the replication model introduced in [2,3].…”

Section: Replication-aware Source Selectionmentioning

confidence: 99%

“…Consider the SPARQL query Q 1 in Figure 1, and the two servers S 1 and S 2 publishing a replica of the DBpedia 2015 dataset, hosted by DBpedia 3 and LANL Linked Data Archive 4 , respectively. Executing Q 1 with the regular TPF client [4] on S 1 alone generates 442 HTTP calls, takes 7s in average, and returns 222 results.…”

Section: Introductionmentioning

confidence: 99%

Ulysses: An Intelligent Client for Replicated Triple Pattern Fragments

Minier

Skaf-Molli

Molli

et al. 2018

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Ulysses is an intelligent TPF client that takes advantage of replicated datasets to distribute the load of SPARQL query processing and provides fault-tolerance. By reducing the load on a TPF server, Ulysses improves the Linked Data availability and distributes the financial costs of queries execution among data providers. This demonstration presents the Ulysses web client and shows how users can run SPARQL queries in their browsers against TPF servers hosting replicated data. It also provides various visualizations that show in real-time how Ulysses performs the actual load distribution and adapts to network conditions during SPARQL query processing.

show abstract

Decomposing Federated Queries in Presence of Replicated Fragments

Cited by 7 publications

References 30 publications

Mapping heterogeneous research infrastructure metadata into a unified catalogue for use in a generic virtual research environment

Mapping heterogeneous research infrastructure metadata into a unified catalogue for use in a generic virtual research environment

Storing and Querying Semantic Data in the Cloud

Ulysses: An Intelligent Client for Replicated Triple Pattern Fragments

Contact Info

Product

Resources

About