Proceedings of the 2006 SIAM International Conference on Data Mining 2006
DOI: 10.1137/1.9781611972764.5
|View full text |Cite
|
Sign up to set email alerts
|

A Latent Dirichlet Model for Unsupervised Entity Resolution

Abstract: In this paper, we address the problem of entity resolution, where given many references to underlying objects, the task is to predict which references correspond to the same object. We propose a probabilistic model for collective entity resolution. Our approach differs from other recently proposed entity resolution approaches in that it is a) unsupervised, b) generative and c) introduces a hidden 'group' variable to capture collections of entities which are commonly observed together. The entity resolution dec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
183
0
4

Year Published

2008
2008
2014
2014

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 209 publications
(187 citation statements)
references
References 26 publications
0
183
0
4
Order By: Relevance
“…A recent line of works focuses on the relationships among records [14,30,24,5]. Reference [9] proposed a technique to resolve entities collectively based on the relationship graph among records. Such techniques are not pairwise because they generally examine all or part of the dataset to learn match decisions.…”
Section: Related Workmentioning
confidence: 99%
“…A recent line of works focuses on the relationships among records [14,30,24,5]. Reference [9] proposed a technique to resolve entities collectively based on the relationship graph among records. Such techniques are not pairwise because they generally examine all or part of the dataset to learn match decisions.…”
Section: Related Workmentioning
confidence: 99%
“…Context matchers commonly represent contextual information (e.g., semantic relationships, hierarchies) in a graph structure, see for example [16,34,7,6,15,21,22,51]. The graph structure allows the propagation of similarity information (e.g., represented as edge weights or auxiliary nodes) to related entities.…”
Section: Matchersmentioning
confidence: 99%
“…First we keep only terms consisting of alphanumeric characters, the hyphen, and the apostrophe, then we delete all stop-words enumerated in the Onix list 3 , and then the text is run through a tree-tagger software for lemmatization 4 . Then 1. for every training corpus C i we take the top.tf terms with top tf values (calculated w.r.t C i ) (the resulting set of terms is denoted by W i ), 2. we unify these term collections over the categories, that is, let W = {W i :…”
Section: Term Selectionmentioning
confidence: 99%
“…LDA is an intensively studied model, and the experiments are really impressive compared to other known information retrieval techniques. The applications of LDA include entity resolution [4], fraud detection in telecommunication systems [5], image processing [6,7,8] and ad-hoc retrieval [9].…”
Section: Introductionmentioning
confidence: 99%