Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL) 2019
DOI: 10.18653/v1/k19-1060
|View full text |Cite
|
Sign up to set email alerts
|

Named Entity Recognition with Partially Annotated Training Data

Abstract: Supervised machine learning assumes the availability of fully-labeled data, but in many cases, such as low-resource languages, the only data available is partially annotated. We study the problem of Named Entity Recognition (NER) with partially annotated training data in which a fraction of the named entities are labeled, and all other tokens, entities or otherwise, are labeled as non-entity by default. In order to train on this noisy dataset, we need to distinguish between the true and false negatives. To thi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
55
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 42 publications
(59 citation statements)
references
References 27 publications
3
55
1
Order By: Relevance
“…Finally, we believe our algorithm can be applied to save annotation effort for other NLP tasks, especially the low-resource ones (Mayhew et al, 2019).…”
Section: Discussionmentioning
confidence: 99%
“…Finally, we believe our algorithm can be applied to save annotation effort for other NLP tasks, especially the low-resource ones (Mayhew et al, 2019).…”
Section: Discussionmentioning
confidence: 99%
“…Partial annotation learning Nooralahzadeh et al, 2019;Cao et al, 2019) takes this into account explicitly. Related approaches learn latent variables , use constrained binary learning (Mayhew et al, 2019) or construct a loss assuming that only unlabeled positive instances exist .…”
Section: Learning With Noisy Labelsmentioning
confidence: 99%
“…For instance, Garrette and Baldridge (2013) obtain labeled data from non-native-speakers and without a quality control on the manual annotations. This can be taken even further by employing annotators who do not speak the low-resource language (Mayhew and Roth, 2018;Mayhew et al, 2019;Tsygankova et al, 2020). Nekoto et al (2020) take the opposite direction, integrating speakers of low-resource languages without formal training into the model development process in an approach of participatory research.…”
Section: Non-expert Supportmentioning
confidence: 99%
“…There has been a small amount of work using non-speaker annotations (Mayhew et al, 2019a), but mainly as an application of a technique, falling short of the exhaustive study in this paper.…”
Section: Related Workmentioning
confidence: 99%
“…We recognize that these annotations are missing many entities. Following recent work on partial annotations, we use an iterative method from (Mayhew et al, 2019a) called Constrained Binary Learning (CBL) that detects unmarked tokens likely to be entities and down-weights them in training. Subsequent results reported use this method on all FS and NS annotations.…”
Section: Machine Learning Modelsmentioning
confidence: 99%