2020
DOI: 10.1609/aaai.v34i05.6380
|View full text |Cite
|
Sign up to set email alerts
|

Fine-Grained Entity Typing for Domain Independent Entity Linking

Abstract: Neural entity linking models are very powerful, but run the risk of overfitting to the domain they are trained in. For this problem, a “domain” is characterized not just by genre of text but even by factors as specific as the particular distribution of entities, as neural models tend to overfit by memorizing properties of frequent entities in a dataset. We tackle the problem of building robust entity linking models that generalize effectively and do not rely on labeled entity linking data with a specific entit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
100
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 70 publications
(102 citation statements)
references
References 15 publications
2
100
0
Order By: Relevance
“…The expense and complexity of obtaining expert annotations of medical information is frequently cited as a major barrier to advancing machine learning-based technologies in medicine (67,68). While our approach did require expert-annotated data, we were able to achieve strong coding performance using a relatively small dataset of only 400 clinical documents, compared to the thousands of documents used in a recent study on extracting evidence of geriatric syndrome (28) or the tens of thousands used in foundational NLP research (69). Datasets of similar scale have been developed for automatic coding of other types of medical information (70), indicating that for a new type of health information, an initial dataset of a few hundred documents is likely to provide significant signal for machine learning.…”
Section: A Template For Expanding Automated Coding To New Concept Dommentioning
confidence: 94%
“…The expense and complexity of obtaining expert annotations of medical information is frequently cited as a major barrier to advancing machine learning-based technologies in medicine (67,68). While our approach did require expert-annotated data, we were able to achieve strong coding performance using a relatively small dataset of only 400 clinical documents, compared to the thousands of documents used in a recent study on extracting evidence of geriatric syndrome (28) or the tens of thousands used in foundational NLP research (69). Datasets of similar scale have been developed for automatic coding of other types of medical information (70), indicating that for a new type of health information, an initial dataset of a few hundred documents is likely to provide significant signal for machine learning.…”
Section: A Template For Expanding Automated Coding To New Concept Dommentioning
confidence: 94%
“…In contrast, we use the category relations directly without requiring such additional steps. Onoe and Durrett (2020) use the direct parent categories of hyperlinks for training entity linking systems.…”
Section: Related Workmentioning
confidence: 99%
“…In this work, we explore a set of interpretable entity representations that are simultaneously human and machine readable. The key idea of this approach is to use fine-grained entity typing models with large type inventories (Ling and Weld, 2012;Gillick et al, 2014;Choi et al, 2018;Onoe and Durrett, 2020). Given an entity mention and context words, our typing model outputs a highdimensional vector whose values are associated with predefined fine-grained entity types.…”
Section: Introductionmentioning
confidence: 99%
“…Each value ranges between 0 and 1, corresponding to the confidence of the model's decision that the entity has the property given by the corresponding type. We use pre-trained Transformer-based entity typing models, trained either on a supervised entity typing dataset (Choi et al, 2018) or on a distantlysupervised dataset derived from Wikipedia categories (Onoe and Durrett, 2020). The type vectors from these models, which contain tens of thousands of types, are then used as contextualized entity embeddings in downstream tasks.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation