2014
DOI: 10.14778/2732977.2732982
|View full text |Cite
|
Sign up to set email alerts
|

Crowdsourcing algorithms for entity resolution

Abstract: In this paper, we study a hybrid human-machine approach for solving the problem of Entity Resolution (ER). The goal of ER is to identify all records in a database that refer to the same underlying entity, and are therefore duplicates of each other. Our input is a graph over all the records in a database, where each edge has a probability denoting our prior belief (based on Machine Learning models) that the pair of records represented by the given edge are duplicates. Our objective is to resolve all the duplica… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
151
1

Year Published

2015
2015
2022
2022

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 149 publications
(163 citation statements)
references
References 9 publications
2
151
1
Order By: Relevance
“…[21,22,30,31]): -Amazon-Google 4 is a data set containing 4598 product records from 2 data sets. There are a total of 10,527,166 pairs of records, of which 1,300 represent the same entity.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…[21,22,30,31]): -Amazon-Google 4 is a data set containing 4598 product records from 2 data sets. There are a total of 10,527,166 pairs of records, of which 1,300 represent the same entity.…”
Section: Methodsmentioning
confidence: 99%
“…[16,30,32,33]). For example, in Isele et al, [16], feedback is sought using active learning on the record pairs on which candidate detailed comparison rules disagree the most.…”
Section: Related Workmentioning
confidence: 99%
“…We compare our ACD method against existing four state-of-theart algorithms: TransNode [44], TransM [47], CrowdER [46], and GCER [48]. Note that CrowdER does not specify the algorithm for clustering crowdsourced record pairs.…”
Section: Methodsmentioning
confidence: 99%
“…For ACD and PC-Pivot, we repeat each of them 5 times in each experiment and report the average measurements, since they are both randomized algorithms. Note that TransNode [44] does not incorporate any parallel mechanism to issue HITs in a batch manner. Therefore, we omit TransNode from the experiments on the number of crowd iterations.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation