Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data 2009
DOI: 10.1145/1559845.1559870
|View full text |Cite
|
Sign up to set email alerts
|

Entity resolution with iterative blocking

Abstract: Entity Resolution (ER) is the problem of identifying which records in a database refer to the same real-world entity. An exhaustive ER process involves computing the similarities between pairs of records, which can be very expensive for large datasets. Various blocking techniques can be used to enhance the performance of ER by dividing the records into blocks in multiple ways and only comparing records within the same block. However, most blocking techniques process blocks separately and do not exploit the res… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
114
0
2

Year Published

2010
2010
2020
2020

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 173 publications
(116 citation statements)
references
References 21 publications
0
114
0
2
Order By: Relevance
“…For example, blocking techniques [14] are commonly used to make ER scalable by dividing the data into (possibly overlapping) blocks and only comparing records within the same block, assuming that records in different blocks are unlikely to match. Since blocking techniques may miss matching records, their results are compared with an "exhaustive" ER solution without blocking, which is considered as the gold standard [15]. While large exhaustive ER results may be very expensive to generate, it need only be generated once, whereas the computation of the distance measure will be performed multiple times for a diverse set of blocking algorithms and parameters.…”
Section: Computing Measuresmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, blocking techniques [14] are commonly used to make ER scalable by dividing the data into (possibly overlapping) blocks and only comparing records within the same block, assuming that records in different blocks are unlikely to match. Since blocking techniques may miss matching records, their results are compared with an "exhaustive" ER solution without blocking, which is considered as the gold standard [15]. While large exhaustive ER results may be very expensive to generate, it need only be generated once, whereas the computation of the distance measure will be performed multiple times for a diverse set of blocking algorithms and parameters.…”
Section: Computing Measuresmentioning
confidence: 99%
“…To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Articles from this volume were presented at The 36th International Conference on Very Large Data Bases, September [13][14][15][16][17]2010, Singapore. dard is generated by a group of human experts.…”
Section: Introductionmentioning
confidence: 99%
“…To correctly assess the impact of a researcher in a research field, correct attribution of research works is essential, so entity disambiguation has been extensively addressed by researchers in information retrieval and data mining. Note that, a related problem considers the task of merging multiple name references into a single entity, where the records belonging to a single person has been erroneously partitioned into multiple name references [2,3,20,27,28]. This task is more popularly known as entity deduplication or record linkage, and it is not the focus of this work.…”
mentioning
confidence: 99%
“…Implementations may use sorting or hashing on the key. Overlapping methods may result in overlapping blocks of entities; implementations include the (multi-pass) sorted neighborhood approach [33], bi-gram indexing [4], canopy clustering [40] and iterative blocking [59]. These methods can require an entity to be matched against multiple blocks (increased overhead) but may lead to a better recall than disjoint methods.…”
Section: Blocking Methodsmentioning
confidence: 99%