Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2014
DOI: 10.3115/v1/p14-1073
|View full text |Cite
|
Sign up to set email alerts
|

Robust Entity Clustering via Phylogenetic Inference

Abstract: Entity clustering must determine when two named-entity mentions refer to the same entity. Typical approaches use a pipeline architecture that clusters the mentions using fixed or learned measures of name and context similarity. In this paper, we propose a model for cross-document coreference resolution that achieves robustness by learning similarity from unlabeled data. The generative process assumes that each entity mention arises from copying and optionally mutating an earlier name from a similar context. Cl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
6
2

Relationship

2
6

Authors

Journals

citations
Cited by 12 publications
(17 citation statements)
references
References 19 publications
0
17
0
Order By: Relevance
“…Models of spelling errors are useful in a variety of settings including spelling correction itself and phylogenetic models of string variation ( Mays et al, 1991;Church and Gale, 1991;Kukich, 1992;Andrews et al, 2014).…”
Section: Methodsmentioning
confidence: 99%
“…Models of spelling errors are useful in a variety of settings including spelling correction itself and phylogenetic models of string variation ( Mays et al, 1991;Church and Gale, 1991;Kukich, 1992;Andrews et al, 2014).…”
Section: Methodsmentioning
confidence: 99%
“…Without a knowledge base, cross-document coreference resolution (CDCR) clusters mentions to form entities (Bagga and Baldwin, 1998b). Since 2011, CDCR has been included as a task in TAC-KBP (Ji et al, 2011) and has attracted renewed interest (Baron and Freedman, 2008b;Green et al, 2012;Andrews et al, 2014). Though traditionally a task restricted to small collections of formal documents (Bagga and Baldwin, 1998b; Baron and Freedman, 2008a), recent work has scaled up CDCR to large heterogenous corpora, e.g.…”
Section: Entity Disambiguationmentioning
confidence: 99%
“…across different texts. Relevant work addressing crossdocument coreference resolution includes [14][15][16] . [7] uses spectral clustering and graph partitioning, and [17] is based on bag of words, latent similarity and clustering techniques.…”
Section: Related Workmentioning
confidence: 99%
“…We aligned pairs of documents from the corpus in all possible ways, and evaluated the results for each pair (171 pairs in total). 16 We computed precision, recall and F-measure of the aligned pairs of nodes across graphs. Since there might be several nodes in an RDF graph that correspond to the same entity (this is due to the way FRED builds RDF representations of input sentences), and we are interested in measuring the quality of the alignment across graphs (not within graphs), we collapsed all equivalent nodes within a graph into single nodes.…”
Section: Experimental Analysismentioning
confidence: 99%