Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics on - EACL '09 2009
DOI: 10.3115/1609067.1609125
|View full text |Cite
|
Sign up to set email alerts
|

Correcting a PoS-tagged corpus using three complementary methods

Abstract: The quality of the part-of-speech (PoS) annotation in a corpus is crucial for the development of PoS taggers. In this paper, we experiment with three complementary methods for automatically detecting errors in the PoS annotation for the Icelandic Frequency Dictionary corpus. The first two methods are language independent and we argue that the third method can be adapted to other morphologically complex languages. Once possible errors have been detected, we examine each error candidate and hand-correct the corr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
29
0
1

Year Published

2013
2013
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 21 publications
(30 citation statements)
references
References 10 publications
0
29
0
1
Order By: Relevance
“…Boyd et al (2007) employed an alignment-based approach to assess differences in the annotation of n-gram word sequences in order to establish the likelihood of error occurrence. Other work in the syntactic inconsistency detection domain includes those related to POS tagging (Loftsson, 2009;Eskin, 2000;Ma et al, 2001) and parse structure (Ule and Simov, 2004 (2) ambiguity associated with the assessment of hard cases. While annotation errors apply to situations where a correct label can be applied but is not done so, hard cases are those where the correct label is inherently difficult to assign, and can be particularly relevant to certain classes of MWEs.…”
Section: Related Workmentioning
confidence: 99%
“…Boyd et al (2007) employed an alignment-based approach to assess differences in the annotation of n-gram word sequences in order to establish the likelihood of error occurrence. Other work in the syntactic inconsistency detection domain includes those related to POS tagging (Loftsson, 2009;Eskin, 2000;Ma et al, 2001) and parse structure (Ule and Simov, 2004 (2) ambiguity associated with the assessment of hard cases. While annotation errors apply to situations where a correct label can be applied but is not done so, hard cases are those where the correct label is inherently difficult to assign, and can be particularly relevant to certain classes of MWEs.…”
Section: Related Workmentioning
confidence: 99%
“…Quite a bit of work has been devoted to the identifcation of errors in manually annotated corpora (Eskin, 2000;van Halteren, 2000;Kveton and Oliva, 2002;Dickinson and Meurers, 2003;Loftsson, 2009;Ambati et al, 2011).…”
Section: Related Workmentioning
confidence: 99%
“…The query-by-committee strategy calls to mind previous work on error detection in manually labelled text that made use of disagreements between the predictions of a classifier ensemble and the manually assigned tag, to identify potential annotation errors in the data (Loftsson, 2009). This approach works surprisingly well, and the tradeoff between precision and recall can be balanced by adding a threshold (i.e.…”
Section: Active Learningmentioning
confidence: 99%
“…I hope future research will succeed at making this process more empirical and more predictable (see also Hovy and Lavid, 2010;Garrette and Baldridge, 2013). There is a great deal more to discover with regard to understanding the range of text varieties (Baldwin et al, 2013), building statistical models of annotator bias (Snow et al, 2008;Hovy et al, 2013;Passonneau and Carpenter, 2014), automatically detecting inconsistencies in linguistic data (Dickinson and Meurers, 2003;Loftsson, 2009;Kato and Matsubara, 2010), and bringing extrinsic models into the annotation loop (Baldridge and Osborne, 2004;Baldridge and Palmer, 2009;Settles, 2012).…”
Section: Why You Shouldn't Take My Word For Itmentioning
confidence: 99%