2011
DOI: 10.1007/978-3-642-19400-9_14
|View full text |Cite
|
Sign up to set email alerts
|

Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics?

Abstract: Abstract. I examine what would be necessary to move part-of-speech tagging performance from its current level of about 97.3% token accuracy (56% sentence accuracy) to close to 100% accuracy. I suggest that it must still be possible to greatly increase tagging performance and examine some useful improvements that have recently been made to the Stanford Part-of-Speech Tagger. However, an error analysis of some of the remaining errors suggests that there is limited further mileage to be had either from better mac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
151
1
14

Year Published

2014
2014
2023
2023

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 282 publications
(168 citation statements)
references
References 17 publications
2
151
1
14
Order By: Relevance
“…For example, the spaCy tagger would find the compound events_section in the sentence ''I keep the events section'', but it would miss it in the sentence ''I can keep the events section''. This is due to the fact that modern NLP taggers are statistical and try to find the most probable tagging in many situations [40]. Furthermore, we do not yet support compounds that consist of three or more nouns, such as ''profile page statistics''.…”
Section: Analysis and Discussioncontrasting
confidence: 66%
“…For example, the spaCy tagger would find the compound events_section in the sentence ''I keep the events section'', but it would miss it in the sentence ''I can keep the events section''. This is due to the fact that modern NLP taggers are statistical and try to find the most probable tagging in many situations [40]. Furthermore, we do not yet support compounds that consist of three or more nouns, such as ''profile page statistics''.…”
Section: Analysis and Discussioncontrasting
confidence: 66%
“…In corpus linguistic and computational linguistics, part-of-speech tagging (POS tagging) is a process that makes up a word in a text (corpus) as corresponding to a particular part of speech, such as noun, verb, adjective, etc. The automatic POS tagging has a long history in computational linguistic studies and now its tagging accuracy reached to over 97% 2 (Manning 2011;Toutanova et al 2003). POS tagged document vectors can be easily scored by comparing adjectives/adverbs/verbs in documents and those in SentiWordNet that contains the polarity scores of words (for example, adjective 'bad' has −0.625 and 'worst' has −0.75 in SentiWordNet).…”
Section: Machine Learning Approach (Mla)mentioning
confidence: 99%
“…Although the machine-learning techniques seem to prevail in the aforementioned studies, we have been inspired by the rule-based methods of parser and POS-tagger tuning, much as fixing of Penn Treebank errors with deterministic rules as described by Manning in [20], especially for the purposes of unknown vocabulary recognition. In his study, Manning analyzes the 100 most recurrent parser errors and offers rule-based solutions to specific problems that hinder a parser's good performance, such as lexicon gaps, unknown vocabulary, difficult linguistics, or having no standard to learn from.…”
Section: Related Workmentioning
confidence: 99%