Proceedings of the 20th International Conference on Computational Linguistics - COLING '04 2004
DOI: 10.3115/1220355.1220495
|View full text |Cite
|
Sign up to set email alerts
|

High-performance tagging on medical texts

Abstract: We ran both Brill's rule-based tagger and TNT, a statistical tagger, with a default German newspaper-language model on a medical text corpus. Supplied with limited lexicon resources, TNT outperforms the Brill tagger with state-of-the-art performance figures (close to 97% accuracy). We then trained TNT on a large annotated medical text corpus, with a slightly extended tagset that captures certain medical language particularities, and achieved 98% tagging accuracy. Hence, statistical off-the-shelf POS taggers ca… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
1

Year Published

2007
2007
2015
2015

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 11 publications
0
2
1
Order By: Relevance
“…Coden et al (2005) achieved a POS tagging accuracy of 87% when applying a general English language tagger to clinical data, and a more recent study reported a highest accuracy of 88.6% when applied to clinical data (Ferraro et al 2013). On the other hand, Hahn & Wermter (2004) achieved surprisingly high accuracy rates thereby refuting previous claims by Campbell & Johnson (2001) that general language off-the-shelf taggers cannot be used on medical text without adaptation. They found that the statistical tagger TnT, when only trained on a German language newspaper corpus (NEGRA) and subsequently applied to a clinical corpus achieved an accuracy of 95.2%.…”
Section: Automatic Analysis Of Clinical Textcontrasting
confidence: 66%
“…Coden et al (2005) achieved a POS tagging accuracy of 87% when applying a general English language tagger to clinical data, and a more recent study reported a highest accuracy of 88.6% when applied to clinical data (Ferraro et al 2013). On the other hand, Hahn & Wermter (2004) achieved surprisingly high accuracy rates thereby refuting previous claims by Campbell & Johnson (2001) that general language off-the-shelf taggers cannot be used on medical text without adaptation. They found that the statistical tagger TnT, when only trained on a German language newspaper corpus (NEGRA) and subsequently applied to a clinical corpus achieved an accuracy of 95.2%.…”
Section: Automatic Analysis Of Clinical Textcontrasting
confidence: 66%
“…There has also been research considering clinical natural language processing tasks in German, for instance, sentence boundary and abbreviation detection [ 30 ] or part-of-speech tagging [ 31 ]. In 2002, Hahn et al [ 4 ] described a system for the extraction of information from findings reports, called MEDSYNDIKATE, which heavily builds upon syntactic parsing and handcrafted or automatically assembled domain knowledge.…”
Section: Introductionmentioning
confidence: 99%
“…The addition of an 800-word domain-specific lexicon revealed a performance increase of 5% and selecting sentences that contained the most frequent unknown words proved to be most helpful. Hahn et al [47] investigated the use of a rule-based POS-tagger (Brill tagger) and a statistical tagger (TnT) on clinical data. The statistical tagger TnT trained on general texts performed close to the state of the art in the medical domain.…”
Section: State Of the Art In Information Extraction From The Ehr Methmentioning
confidence: 99%