Proceedings of the Workshop on BioNLP 2007 Biological, Translational, and Clinical Language Processing - BioNLP '07 2007
DOI: 10.3115/1572392.1572411
|View full text |Cite
|
Sign up to set email alerts
|

A shared task involving multi-label classification of clinical free text

Abstract: This paper reports on a shared task involving the assignment of ICD-9-CM codes to radiology reports. Two features distinguished this task from previous shared tasks in the biomedical domain. One is that it resulted in the first freely distributable corpus of fully anonymized clinical text. This resource is permanently available and will (we hope) facilitate future research. The other key feature of the task is that it required categorization with respect to a large and commercially significant set of labels. T… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
150
0
5

Year Published

2007
2007
2018
2018

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 267 publications
(161 citation statements)
references
References 9 publications
0
150
0
5
Order By: Relevance
“…This is in part because the use of patient records is subject to strict regulation. Thus, the corpus used for most auto-coding research up to date consists of about two thousand documents annotated with 45 ICD-9 codes (Pestian et al, 2007). It was used in a shared task at the 2007 BioNLP workshop and gave rise to papers studying a variety of rule-based and statistical methods, which are too numerous to list here.…”
Section: Related Workmentioning
confidence: 99%
“…This is in part because the use of patient records is subject to strict regulation. Thus, the corpus used for most auto-coding research up to date consists of about two thousand documents annotated with 45 ICD-9 codes (Pestian et al, 2007). It was used in a shared task at the 2007 BioNLP workshop and gave rise to papers studying a variety of rule-based and statistical methods, which are too numerous to list here.…”
Section: Related Workmentioning
confidence: 99%
“…These corpora are publicly available and are explained below. ICD9 dataset is an open challenge dataset published by the Computational Medicine Center in 2007 (Pestian et al, 2007). The dataset consists of clinical free text which is a set of 978 anonymized radiology reports and their corresponding ICD-9-CM codes.…”
Section: Datasetsmentioning
confidence: 99%
“…In 2007 Pestian et al (2007) organised a shared task which introduced a dataset of radiology reports to be autocoded with ICD9 codes. This multi-label classification task attracted a large body of research over the years-e.g., (Farkas and Szarvas, 2008;Suominen et al, 2008)-which tackled the problem with methods such as rule-based, decision trees, entropy and SVM classifiers.…”
Section: Related Workmentioning
confidence: 99%
“…Data contained around 350,000 abstracts from the MED-LINE database over five years, manually created topics, and a topic set based on the standardised MeSH. 10 The Genomics Track [23] [28] and 2011 [29] addressed automated diagnosis coding of radiology reports and classifying the emotions found in suicide notes. In 2007, 1,954 de-identified radiology reports in English from a US radiology department for children were used.…”
Section: Introductionmentioning
confidence: 99%