2020
DOI: 10.48550/arxiv.2008.01377
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Reliable Part-of-Speech Tagging of Historical Corpora through Set-Valued Prediction

Stefan Heid,
Marcel Wever,
Eyke Hüllermeier

Abstract: Syntactic annotation of corpora in the form of part-of-speech ( ) tags is a key requirement for both linguistic research and subsequent automated natural language processing ( ) tasks. This problem is commonly tackled using machine learning methods, i.e., by training a tagger on a sufficiently large corpus of labeled data. While the problem of tagging can essentially be considered as solved for modern languages, historical corpora turn out to be much more difficult, especially due to the lack of native speaker… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 11 publications
0
2
0
Order By: Relevance
“…To make uncertainties and ambiguities recognizable and understandable to the machine annotator, suitable mathematical formalisms as described in Section 4 are necessary to process them consistently in the induction process of annotations models. Furthermore, it is equally important that the machine annotator is not only able to understand uncertainties and ambiguities to build hypotheses about the annotation task, but also to communicate its (model) uncertainty (Heid et al, 2020), which is different than the epistemic uncertainty of the human expert, and identified ambiguities to the human annotator. Thereby, the machine and the human annotator can collaborate on the annotation task at an equal level.…”
Section: Machine Annotatormentioning
confidence: 99%
“…To make uncertainties and ambiguities recognizable and understandable to the machine annotator, suitable mathematical formalisms as described in Section 4 are necessary to process them consistently in the induction process of annotations models. Furthermore, it is equally important that the machine annotator is not only able to understand uncertainties and ambiguities to build hypotheses about the annotation task, but also to communicate its (model) uncertainty (Heid et al, 2020), which is different than the epistemic uncertainty of the human expert, and identified ambiguities to the human annotator. Thereby, the machine and the human annotator can collaborate on the annotation task at an equal level.…”
Section: Machine Annotatormentioning
confidence: 99%
“…The basic idea of the above methods is to use the richer corpus resources language to assist in the PoS tagging of a scarce corpus resources language. However, this approach may lead to semantic mis-understanding 8 .…”
Section: Introductionmentioning
confidence: 99%