Proceedings of the Fourth ACM International Conference on Web Search and Data Mining 2011
DOI: 10.1145/1935826.1935903
|View full text |Cite
|
Sign up to set email alerts
|

Efficient entity resolution for large heterogeneous information spaces

Abstract: We have recently witnessed an enormous growth in the volume of structured and semi-structured data sets available on the Web. An important prerequisite for using and combining such data sets is the detection and merge of information that describes the same real-world entities, a task known as Entity Resolution. To make this quadratic task efficient, blocking techniques are typically employed. However, the high dynamics, loose schema binding, and heterogeneity of (semi-)structured data, impose new challenges to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
103
0

Year Published

2012
2012
2024
2024

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 74 publications
(103 citation statements)
references
References 24 publications
0
103
0
Order By: Relevance
“…Therefore, we are not only required to establish the map between 1453505_a_at and EIF2AK3, we also need to generate the transitive map between 1453505_a_at and PERK via EIF2AK3. In other words, the traditional record linkage [19], [20], [21], [22], [23], [24], [25] and mapping techniques largely do not help, and a higher level mapping method is warranted.…”
Section: Information Aggregation Using Id Conversionmentioning
confidence: 99%
See 1 more Smart Citation
“…Therefore, we are not only required to establish the map between 1453505_a_at and EIF2AK3, we also need to generate the transitive map between 1453505_a_at and PERK via EIF2AK3. In other words, the traditional record linkage [19], [20], [21], [22], [23], [24], [25] and mapping techniques largely do not help, and a higher level mapping method is warranted.…”
Section: Information Aggregation Using Id Conversionmentioning
confidence: 99%
“…BioFlow also supports two other statements called combine and link to facilitate entity based [19] (as opposed to tuple based) union and join respectively, which can be used to collect IDs from different sources, and for implementing the extend statement despite representation heterogeneities. The syntax for the two statements are presented below.…”
Section: Query Translationmentioning
confidence: 99%
“…These techniques deploy a variety of different methodologies and directions. These includes string similarity metrics [8,9] for computing the matching being the given textual representations of entities, the use of the available inner-relationships between the entities [17,27], clustering [7], and blocking techniques [30,31] for reducing the required execution time. However, as already explained in Sect.…”
Section: Related Workmentioning
confidence: 99%
“…Entity resolution (ER), also known as record linkage, entity reconciliation, or merge/purge, is the procedure of identifying a group of entities (records) representing the same realworld entity [1][2][3]. Generally speaking, ER has become the first step of data processing and widely used in many application domain, such as digital libraries, smart city, financial transactions, and social networks.…”
Section: Introductionmentioning
confidence: 99%