2012
DOI: 10.1186/1471-2105-13-207
|View full text |Cite
|
Sign up to set email alerts
|

A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools

Abstract: BackgroundWe introduce the linguistic annotation of a corpus of 97 full-text biomedical publications, known as the Colorado Richly Annotated Full Text (CRAFT) corpus. We further assess the performance of existing tools for performing sentence splitting, tokenization, syntactic parsing, and named entity recognition on this corpus.ResultsMany biomedical natural language processing systems demonstrated large differences between their previously published results and their performance on the CRAFT corpus when test… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
105
0

Year Published

2012
2012
2022
2022

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 108 publications
(105 citation statements)
references
References 45 publications
0
105
0
Order By: Relevance
“…A team consisting of the author, a biology doctoral student with research experience in genetics, and a computer science graduate student with an undergraduate degree in philosophy collaborated on identifying the arguments in the Results section of two articles ( Van de Leemput et al, 2007;Lloyd et al, 2005) from the CRAFT corpus (Bada et al, 2012;Verspoor et al 2012). The CRAFT corpus is open access and has already been linguistically annotated.…”
Section: Preparationmentioning
confidence: 99%
“…A team consisting of the author, a biology doctoral student with research experience in genetics, and a computer science graduate student with an undergraduate degree in philosophy collaborated on identifying the arguments in the Results section of two articles ( Van de Leemput et al, 2007;Lloyd et al, 2005) from the CRAFT corpus (Bada et al, 2012;Verspoor et al 2012). The CRAFT corpus is open access and has already been linguistically annotated.…”
Section: Preparationmentioning
confidence: 99%
“…A team consisting of the author, a biology doctoral student with research experience in genetics, and a computer science graduate student with an undergraduate degree in philosophy collaborated on identifying the arguments in the Results section of two articles ( Van de Leemput et al, 2007;Lloyd et al, 2005) from the CRAFT corpus (Bada et al, 2012;Verspoor et al 2012). The CRAFT corpus is open access and has already been linguistically annotated.…”
Section: Preparationmentioning
confidence: 99%
“…The Yang et al (2015) system uses LingPipe and ClearNLP 7 (Choi and Palmer, 2011) to parse the questions and relevant snippets using models applicable to generic English texts as well as biomedical texts, e.g. the parser models trained on the CRAFT treebank (Verspoor et al, 2012). It uses the named entity recognition (NER) module from LingPipe 8 trained on the GENIA corpus (Kim et al, 2003) and MetaMap annotation component (Aronson, 2001) to identify the biomedical concepts, and further uses UMLS Terminology Services (UTS) 9 to identify concepts and retrieve synonyms.…”
Section: Overview Of Yang Et Al (2015) Systemmentioning
confidence: 99%