BioNLP 2017 2017
DOI: 10.18653/v1/w17-2342
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Diagnosis Coding of Radiology Reports: A Comparison of Deep Learning and Conventional Classification Methods

Abstract: Diagnosis autocoding is intended to both improve the productivity of clinical coders and the accuracy of the coding. We investigate the applicability of deep learning at autocoding of radiology reports using International Classification of Diseases (ICD). Deep learning methods are known to require large training data. Our goal is to explore how to use these methods when the training data is sparse, skewed and relatively small, and how their effectiveness compares to conventional methods. We identify optimal pa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
54
0
1

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 53 publications
(55 citation statements)
references
References 11 publications
0
54
0
1
Order By: Relevance
“…The colon and brain cancer THYME corpus was used in several general domain conference and workshop articles (37,38,40,(67)(68)(69), whereas a radiology report dataset from a 2007 challenge (available from ref. 70) was used in another (71), and SEER-provided (although unshared thus not available for distribution) corpus was used in yet another (72). Other work using ad hoc resources has been used for methods development but this is a less sustainable model due to the rarity of expertise in both cancer and NLP (73)(74)(75).…”
Section: Shareable Resources For Nlp In Oncologymentioning
confidence: 99%
“…The colon and brain cancer THYME corpus was used in several general domain conference and workshop articles (37,38,40,(67)(68)(69), whereas a radiology report dataset from a 2007 challenge (available from ref. 70) was used in another (71), and SEER-provided (although unshared thus not available for distribution) corpus was used in yet another (72). Other work using ad hoc resources has been used for methods development but this is a less sustainable model due to the rarity of expertise in both cancer and NLP (73)(74)(75).…”
Section: Shareable Resources For Nlp In Oncologymentioning
confidence: 99%
“…A good scope review into radiology report-processing efforts is also presented in [4]. A more recent work involving the use of artificial neural networks and word embeddings for automated diagnosis coding of radiology reports is reported in [6]. A study used an emergency department's earlier medical records to predict and reduce its overcrowding [7].…”
Section: Introductionmentioning
confidence: 99%
“…With respect to automated text classification, in this work, we compared the approaches from the two main paradigms: (1) symbolic text classification, in which texts are represented with sparse vectors of TF-IDF weights, used as input features for traditional machine learning algorithms, such as Logistic Regression (LR) or Support Vector Machine (SVM); and (2) a more recent semantic text classification paradigm, in which dense semantic representations of words-word embeddings-are introduced as input to a neural architecture. Different deep learning architectures have been tried in a number of medical text classification tasks [25][26][27], including automated classification of radiology reports [6,28,29]. While recurrent [29,30] and attention-based neural networks [27,31] may present a viable solution, convolutional neural networks (CNN) seem to generally offer an edge in classification performance as well as faster training times [6,29].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, there is still a lack of systematic study on how to select appropriate data to pretrain word vectors or LMs. We observe a range of heuristic strategies in the literature: (1) collecting a large amount of generic data, e.g., web crawl (Pennington et al, 2014;Mikolov et al, 2018); (2) selecting data from a similar field (the subject matter of the content being discussed), e.g., biology (Chiu et al, 2016;Karimi et al, 2017); and, (3) selecting data from a similar tenor (the participants in the discourse, their relationships to each other, and their purposes), e.g., Twitter, or online forums (Li et al, 2017;Chronopoulou et al, 2019). In all these settings, the decision is based on heuristics and varies according to the individual's experience.…”
Section: Introductionmentioning
confidence: 99%