Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing 2020
DOI: 10.18653/v1/2020.bionlp-1.8
|View full text |Cite
|
Sign up to set email alerts
|

Experimental Evaluation and Development of a Silver-Standard for the MIMIC-III Clinical Coding Dataset

Abstract: Clinical coding is currently a labour-intensive, error-prone, but critical administrative process whereby hospital patient episodes are manually assigned codes by qualified staff from large, standardised taxonomic hierarchies of codes. Automating clinical coding has a long history in NLP research and has recently seen novel developments setting new state of the art results. A popular dataset used in this task is MIMIC-III, a large intensive care database that includes clinical free text notes and associated co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 16 publications
(10 citation statements)
references
References 17 publications
1
9
0
Order By: Relevance
“…They contain indicators for a diagnosis or procedure but miss the corresponding ICD-9 code. This is consistent with results from Searle et al (2020) showing that MIMIC III is up to 35% under-coded. Additionally we find that procedures that are almost always performed in the ICU such as Puncture of vessel are often coded inconsistently.…”
Section: There Is No Ground Truth In Clinical Datasupporting
confidence: 91%
“…They contain indicators for a diagnosis or procedure but miss the corresponding ICD-9 code. This is consistent with results from Searle et al (2020) showing that MIMIC III is up to 35% under-coded. Additionally we find that procedures that are almost always performed in the ICU such as Puncture of vessel are often coded inconsistently.…”
Section: There Is No Ground Truth In Clinical Datasupporting
confidence: 91%
“…We want to know if our model is also able to predict diagnoses that are not mentioned in the text. We annotate the admission texts with ICD-9 diagnosis codes with the methodology described by Searle et al (2020). We then evaluate on codes that were explicitly mentioned in the text and those that were not.…”
Section: Discussion and Findingsmentioning
confidence: 99%
“…Crowd-sourcing platform: Another possible research area is in developing a crowd-sourcing clinical coding and classification platform where the experts can guide and share their views, ideas and knowledge with the less experienced coders and researchers. A study by Searle et al [79] has found that frequently assigned codes in MIMIC-III data display signs of undercoding up to 35%. No other study has attempted to validate the MIMIC-III data due to time consuming factor and costly nature of the endeavour.…”
Section: Future Research Directionsmentioning
confidence: 99%
“…No other study has attempted to validate the MIMIC-III data due to time consuming factor and costly nature of the endeavour. For example, if two clinical coders, worked 38 hours a week re-coding all 52,726 admission notes at a rate of 5 minutes and $3 per document, that would amount to approximately $316,000(US) and approximately 115 weeks to create a gold standard dataset [79]. Therefore, if a crowdsourced knowledge-based platform was created, then the problem of overcoding, undercoding and lack of data sources using the latest coding version could be resolved.…”
Section: Future Research Directionsmentioning
confidence: 99%