2018
DOI: 10.1007/978-3-319-75487-1_38
|View full text |Cite
|
Sign up to set email alerts
|

Supervised Topic Models for Diagnosis Code Assignment to Discharge Summaries

Abstract: Mining medical data has significantly gained interest in the recent years thanks to the advances in data mining and machine learning fields. In this work, we focus on a challenging issue in medical data mining: automatic diagnosis code assignment to discharge summaries, i.e., characterizing patient's hospital stay (diseases, symptoms, treatments, etc.) with a set of codes usually derived from the International Classification of Diseases (ICD). We cast the problem as a machine learning task and we experiment so… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

1
3
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 11 publications
1
3
0
Order By: Relevance
“…It is difficult to compare our results with the literature since the problem definition and the finality are not always the same. For instance in terms of F1-Score, [7] obtained 46% for Spanish text classification, [4] reached 74% in hematology unit with 30 diagnosis, [3] attained 88% in radiology with 45 diagnosis, [8] obtained 76% with 6 diagnosis, [9] reached 46% with 500 diagnosis while in our work, we obtained 83% with 346 diagnosis.…”
Section: Discussionsupporting
confidence: 51%
See 1 more Smart Citation
“…It is difficult to compare our results with the literature since the problem definition and the finality are not always the same. For instance in terms of F1-Score, [7] obtained 46% for Spanish text classification, [4] reached 74% in hematology unit with 30 diagnosis, [3] attained 88% in radiology with 45 diagnosis, [8] obtained 76% with 6 diagnosis, [9] reached 46% with 500 diagnosis while in our work, we obtained 83% with 346 diagnosis.…”
Section: Discussionsupporting
confidence: 51%
“…Depending on the type of data, multiple methods have been applied ranging from simple regression to advanced DL approaches with the objective to maximize performance and improve the quality of cares [3]. In the last decade, Authors in [4] experimented probabilistic topic models on collected DSs within urology and hematology services by comparing models issued from both classical ML approaches (Decision Tree, Naïve Bayes, and SVM) and modern NLP approaches (supervised Latent Dirichlet Allocation (LDA) and labeled LDA). In [5] authors presented a multimodal machine learning model to cope with different type of data including unstructured text, semi-structured text and structured tabular data for which Text Convolutional Neural Network (CNN), Bidirectional LSTM and decision trees were respectively applied.…”
Section: Introductionmentioning
confidence: 99%
“…Although we have explored several levels of granularity, namely, Chapter, Main and Full granularity, we have focused on the complete ICD, as it is of great importance for applications such as insurance billing or other clinical information extraction tasks. Often, previous works discarded learning ICDs that had little prevalence in the set or which only focused on a set of nearly a hundred labels [48,50,34]. In our case, we assessed them all, but as we could expect, prevalent ICDs are predicted more accurately than the average prediction quality.…”
Section: Discussionmentioning
confidence: 96%
“…Compared to the related works, the input EHR is not restricted to a diagnostic phrase of few words as in [49] or a short note as in [48]. Our EHRs comprise full notes with 864 ± 415 words on average it is close to a full clinical history of MIMIC-III, entailing several notes for a given patient, with 1, 399 ± 721 words per history on average.…”
Section: Discussionmentioning
confidence: 99%