Instance Based Clustering of Semantic Web Resources

Grimnes, Gunnar Aastrand; Edwards, Peter; Preece, Alun

doi:10.1007/978-3-540-68234-9_24

Cited by 40 publications

(23 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The system mainly consists of three components: keyword search, clustering, and result presentation. Clearly, there exist a large number of approaches proposed for dealing with each of these three problems [2,8,11,12,17,26].…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Pay-less entity consolidation

Tran

Cheng

2012

Proceedings of the 4th Annual ACM Web Science Conference

View full text Add to dashboard Cite

Linked Data consists of billions of RDF triples from hundreds of different sources on the Web. The effective construction and maintenance of links between these sources largely depend on data integration solutions that scale to the large volume and heterogeneity of the Linked Data Web. In this context, a promising direction is the pay-as-you-go paradigm that advocates the use of user feedback for an interactive and incremental approach to data integration-to obtain a solution that continuously improves as the underlying system evolves. In this paper, we study pay-as-you-go data integration in the context of entity search. Compared with asking users to contribute to the result quality without to directly benefit from their effort, we show that users "pay less" when entity consolidation is inherently embedded in entity search. In this setting, users interact with the system for solving their tasks, and as a side-effect, contribute to the quality of the consolidation results. We propose an iterative clustering procedure to implement this concept of pay-less entity consolidation. We demonstrate its promising advantages over traditional solutions grounded on an extensive evaluation.

show abstract

Section: Methodsmentioning

confidence: 99%

“…in terms of concepts (i.e. classes of the entity) [8], in terms of neighboring entities in the RDF graph [10], as property-value pairs [13], or as paths in the RDF graph starting from the entity [11].…”

Section: Entity Consolidationmentioning

confidence: 99%

Pay-less entity consolidation

Tran

Cheng

2012

Proceedings of the 4th Annual ACM Web Science Conference

View full text Add to dashboard Cite

show abstract

“…Some of the approaches use template-based SPARQL queries [4] to generate the feature vector from single RDF datasets, and in some approaches, a federated SPARQL query is used to generate the feature vector from multiple RDF datasets of the LOD cloud [5]. Some approaches use immediate neighboring properties and the concise bound description approach for information extraction to be used to build the feature vector [6]. Most of the approaches used in the past follow a fixed depth and/or fixed number of nodes for instance extraction.…”

Section: Motivation and Problem Statementmentioning

confidence: 99%

Leveraging linked open data information extraction for data mining applications

Mahule¹,

Vyas²

2016

Turk J Elec Eng & Comp Sci

View full text Add to dashboard Cite

Abstract:The linked open data cloud, with a huge volume of data from heterogeneous and interlinked datasets, has turned the Web into a large data store. It has attracted the attention of both developers and researchers in the last few years, opening up new dimensions in machine learning and knowledge discovery. Information extraction procedures for these processes use different approaches, e.g., template-based, federated to multiple sources, fixed depth link traversal, etc. They are limited by problems in online access to datasets' SPARQL endpoints, such as servers being down for maintenance, bandwidth narrowing, limited numbers of access points to datasets in particular time slots, etc., which may result in imprecise and incomplete sets of feature vector generation, affecting the quality of knowledge discovered.The work presented here addresses the disadvantages of online data retrieval by proposing a simple and automatic way to extract features from the linked open data cloud using a linked traversal approach in a local environment with previously identified and known sets of interlinked RDF datasets. The user is given the flexibility to determine the depth of the neighboring properties to be traversed for information extraction to generate the feature vector, which can be used for machine learning and knowledge discovery. The experiment is performed locally with Virtuoso Triple Store for storage of datasets and an interface developed to build the feature vector. The evaluation is performed by comparing the obtained feature vector with gold standard instances annotated manually and with a case study for estimating the effects of demography in movie production for a country. The advantage of the proposed approach lies in overcoming problems with online access of data from the linked data cloud, RDF dataset integration in both local and web environments to build feature vectors for machine learning, and generating background knowledge from the linked data cloud.

show abstract

“…The semantic metadata information is used to calculate the semantic similarity between Web pages. The semantic similarity between the Web pages is calculated using the method described in [26]. The method returns a similarity value between 0 and 1, where 1 means that the instances have exactly the same properties and 0 means no shared properties.…”

Section: Semantic Annotationmentioning

confidence: 99%