2016
DOI: 10.1016/j.websem.2015.12.003
|View full text |Cite
|
Sign up to set email alerts
|

Learning the semantics of structured data sources

Abstract: Information sources such as relational databases, spreadsheets, XML, JSON, and Web APIs contain a tremendous amount of structured data that can be leveraged to build and augment knowledge graphs. However, they rarely provide a semantic model to describe their contents. Semantic models of data sources represent the implicit meaning of the data by specifying the concepts and the relationships within the data. Such models are the key ingredients to automatically publish the data into knowledge graphs. Manually mo… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
39
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 69 publications
(39 citation statements)
references
References 37 publications
0
39
0
Order By: Relevance
“…The approach analyzes the content of the column using NLP techniques and recommends an RDF type in conjunction with a datatype property containing the literal value of a column's cell. The other recommendation approach is based on what the user has previously modeled [32]. For example, if she has already modeled data entities and relationship about museum items, and the next data collection contains data on other museum items, the system is likely to recognize this and recommends the vocabulary terms that were used to model the previous data collection.…”
Section: Vocabulary Recommender Systemsmentioning
confidence: 99%
“…The approach analyzes the content of the column using NLP techniques and recommends an RDF type in conjunction with a datatype property containing the literal value of a column's cell. The other recommendation approach is based on what the user has previously modeled [32]. For example, if she has already modeled data entities and relationship about museum items, and the next data collection contains data on other museum items, the system is likely to recognize this and recommends the vocabulary terms that were used to model the previous data collection.…”
Section: Vocabulary Recommender Systemsmentioning
confidence: 99%
“…More precisely, considering the presence of correspondences between the schema in the source representation and the target schema, schema matching methods find the matches between the source properties and the target properties, while schema mapping techniques find the mapping rules for transforming the information in the source according to the target schema. Among schema‐matching techniques, ontology‐matching methods (Dhamankar et al, 2004; Taheriyan et al, 2016) have been extensively investigated, where the target schema is an ontology containing different classes that can be also hierarchical organized. Conversely, KB‐mapping approaches (Bhagavatula et al, 2015; Chu et al, 2015; Limaye et al, 2010; Mulwad et al, 2013) map cell/tuple values of the table to KB instances and then exploit both probabilistic graphical models and iterative algorithms to explore the correlation between different matching tasks for disambiguation.…”
Section: Approaches For the Table Understanding Problemmentioning
confidence: 99%
“…Many systems, for example, OpenRefine, Wrangler (Kandel, Paepcke, Hellerstein, & Heer, 2011; Trifacta, 2020), Potter's Wheel (Raman & Hellerstein, 2001), and Senbazuru (Chen, Cafarella, Chen, Prevo, & Zhuang, 2013), have been also proposed from the academic and industrial communities for supporting the users in the extraction and transformation of table data and for the generation of programs by examples, for example, ProgFromEx (Gulwani, 2011), FlashRelate (Barowy, Gulwani, Hart, & Zorn, 2015), and Foofah (Jin, Anderson, Cafarella, & Jagadish, 2017). For what specifically concerns table interpretation, well‐known approaches that rely on schema matching (Bellahsene, Bonifati, & Rahm, 2011; Dhamankar, Lee, Doan, Halevy, & Domingos, 2004; Taheriyan, Knoblock, Szekely, & Ambite, 2016) have been recently substituted by approaches that combine schema matching with cell mapping (Bhagavatula, Noraset, & Downey, 2015; Chu et al, 2015; Limaye, Sarawagi, & Chakrabarti, 2010; Mulwad, Finin, & Joshi, 2013; Ritze, Lehmberg, & Bizer, 2015; Zhang, 2017) to KBs (e.g., YAGo, DBPedia, and WordNet) automatically extracted from the Web and covering different domains. Moreover, there has been an increasing usage of deep learning (DL) techniques (Chen, Jiménez‐Ruiz, Horrocks, & Sutton, 2019; Efthymiou, Hassanzadeh, Rodriguez‐Muro, & Christophides, 2017; Takeoka, Oyamada, Nakadai, & Okadome, 2019) that have shown promising results when dealing with noisy, heterogeneous, incomplete, and ambiguous data, which make the extraction process even harder (Thirumuruganathan, Tang, Ouzzani, & Doan, 2018).…”
Section: Introductionmentioning
confidence: 99%
“…Before feeding the record sets returned by data extraction into a particular application, it is commonly necessary to perform some of the following integration tasks: semantisation [25,45,54,55,60,63,71], which either maps the descriptors onto the terminology box of a particular ontology or the tuples onto its assertion box [19]; union [23], which merges record sets that provide similar data; finding primary keys [62], which determines which components of the tuples identify them as univocally as possible; record linkage [8,11,12], which finds different records that refer to the same actual entities; augmentation [6,52,67], which joins record sets on the same topic to complete the information that they provide individually; and cleaning [10,31,61], which fixes data. Note that the integration tasks are orthogonal to data extraction because they are independent from the source of the record sets, which is the reason why they fall out of the scope of this article.…”
Section: Data-extraction Vocabularymentioning
confidence: 99%