2020
DOI: 10.48550/arxiv.2010.10363
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation

Abstract: A challenge for named entity disambiguation (NED), the task of mapping textual mentions to entities in a knowledge base, is how to disambiguate entities that appear rarely in the training data, termed tail entities. Humans use subtle reasoning patterns based on knowledge of entity facts, relations, and types to disambiguate unfamiliar entities. Inspired by these patterns, we introduce Bootleg, a self-supervised NED system that is explicitly grounded in reasoning patterns for disambiguation. We define core reas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
9
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5

Relationship

3
2

Authors

Journals

citations
Cited by 8 publications
(10 citation statements)
references
References 49 publications
1
9
0
Order By: Relevance
“…This work explores how to effectively provide these properties to popular transformer models. Tail entities are those seen < 10 times during training and head entities are seen ≥ 10 times, consistent with Orr et al (2020); Goel et al (2021).…”
Section: Objectivesupporting
confidence: 60%
See 2 more Smart Citations
“…This work explores how to effectively provide these properties to popular transformer models. Tail entities are those seen < 10 times during training and head entities are seen ≥ 10 times, consistent with Orr et al (2020); Goel et al (2021).…”
Section: Objectivesupporting
confidence: 60%
“…We collect entity metadata from Wikidata for our evaluations, a compelling choice as several works successfully improve tail performance in industrial workloads using the knowledge base (e.g., Orr et al (2020)) We use the state-of-the-art pretrained entity-linking model from Orr et al (2020) to link the text in each task to an October 2020 dump of Wikidata. We use Wikidata and the first sentence of an entity's Wikipedia page to obtain descriptions.…”
Section: Metadata Sourcementioning
confidence: 99%
See 1 more Smart Citation
“…We now describe a subset of these domains, which we evaluate in Section 8. Additional applicable domains include entity linkage or disambiguation [46,55], nearest neighbor machine translation [28], and nearest neighbor language modeling [29].…”
Section: Motivating Applicationsmentioning
confidence: 99%
“…We use AmbER sets to conduct a systematic study of various retrieval systems that operate under different principles, such as token overlap and dense embedding similarity. Retrievers perform very differently on AmbER sets in terms of absolute retrieval numbers, with Bootleg (Orr et al, 2020), an entity-linking-based retriever, performing best. Despite these differences, all retrievers exhibit a large degree of popularity bias, underperforming on inputs concerning tail entities.…”
Section: Introductionmentioning
confidence: 99%