Validation of deep learning natural language processing algorithm for keyword extraction from pathology reports in electronic health records

Kim, Yoojoong; Lee, Ju Han; Choi, Suna; Kim, Jong Ho; Seok, Junhee; Joo, Hyung Joon

doi:10.1038/s41598-020-77258-w

Cited by 36 publications

(33 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Modeling of the epilepsy dataset with the LSTM, BiLSTM, and CNN models, which are popular deep learning models, are presented comparatively. When this study is compared to other works [6][7][8][9][10][11][12][13][14][15][16][17][18][19], no study like this study was found for the actual MRI data set of patients with the epilepsy. Among our proposed methods, due to the BiLSTM network's ability to operate both backward and forward information, it has been proven to give the best classification result.…”

Section: Discussionmentioning

confidence: 92%

“…The clinical reports for the breast pathology were identified by the rule based and boosting methods in the [15] by the 97% classification accuracy rate. The pathology reports were analyzed for the keyword extraction by the fine-tuning and deep learning approaches in the [16]. The bidirectional encoder representations from transformers model was successful model for the precision and recall values.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Epilepsy Radiology Reports Classification Using Deep Learning Networks

Bayrak¹,

Yücel²,

Takçı³

2022

Computers, Materials &Amp; Continua

View full text Add to dashboard Cite

The automatic and accurate classification of Magnetic Resonance Imaging (MRI) radiology report is essential for the analysis and interpretation epilepsy and non-epilepsy. Since the majority of MRI radiology reports are unstructured, the manual information extraction is time-consuming and requires specific expertise. In this paper, a comprehensive method is proposed to classify epilepsy and non-epilepsy real brain MRI radiology text reports automatically. This method combines the Natural Language Processing technique and statistical Machine Learning methods. 122 real MRI radiology text reports (97 epilepsy, 25 non-epilepsy) are studied by our proposed method which consists of the following steps: (i) for a given text report our systems first cleans HTML/XML tags, tokenize, erase punctuation, normalize text, (ii) then it converts into MRI text reports numeric sequences by using indexbased word encoding, (iii) then we applied the deep learning models that are uni-directional long short-term memory (LSTM) network, bidirectional long short-term memory (BiLSTM) network and convolutional neural network (CNN) for the classifying comparison of the data, (iv) finally, we used 70% of used for training, 15% for validation, and 15% for test observations. Unlike previous methods, this study encompasses the following objectives: (a) to extract significant text features from radiologic reports of epilepsy disease; (b) to ensure successful classifying accuracy performance to enhance epilepsy data attributes. Therefore, our study is a comprehensive comparative study with the epilepsy dataset obtained from numeric sequences by using index-based word encoding method applied for the deep learning models. The traditional method is numeric sequences by using index-based word encoding which has been made for the first time in the literature, is successful feature descriptor in the epilepsy data set. The BiLSTM network has shown a promising performance regarding the accuracy rates. We show that the larger sized medical text reports can be analyzed by our proposed method.

show abstract

Section: Discussionmentioning

confidence: 92%

Section: Introductionmentioning

confidence: 99%

Epilepsy Radiology Reports Classification Using Deep Learning Networks

Bayrak¹,

Yücel²,

Takçı³

2022

Computers, Materials &Amp; Continua

View full text Add to dashboard Cite

show abstract

“…Other excellent applications of BERT-based text models include the prediction of relative value units (RVU’s) via report complexity for pathologist compensation calculations (which is related to primary code assignment) and the detection of cases that may have been mis-billed (e.g., a code of lower complexity was assigned), which can potentially save the hospital resources. [ 61 ] We are currently developing a web application that will both interface with the Pathology Information System and can be used to estimate the fiscal impact of underbilling by auditing reports with false positive findings. Tools such as Inspirata can also provide additional structuring for our pathology reports outside of existing schemas.…”

Section: Discussionmentioning

confidence: 99%

Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports

Levy

N²,

Haudenschild

et al. 2022

Journal of Pathology Informatics

View full text Add to dashboard Cite

Background: Pathology reports serve as an auditable trial of a patient’s clinical narrative, containing text pertaining to diagnosis, prognosis, and specimen processing. Recent works have utilized natural language processing (NLP) pipelines, which include rule-based or machine-learning analytics, to uncover textual patterns that inform clinical endpoints and biomarker information. Although deep learning methods have come to the forefront of NLP, there have been limited comparisons with the performance of other machine-learning methods in extracting key insights for the prediction of medical procedure information, which is used to inform reimbursement for pathology departments. In addition, the utility of combining and ranking information from multiple report subfields as compared with exclusively using the diagnostic field for the prediction of Current Procedural Terminology (CPT) codes and signing pathologists remains unclear. Methods: After preprocessing pathology reports, we utilized advanced topic modeling to identify topics that characterize a cohort of 93,039 pathology reports at the Dartmouth-Hitchcock Department of Pathology and Laboratory Medicine (DPLM). We separately compared XGBoost, SVM, and BERT (Bidirectional Encoder Representation from Transformers) methodologies for the prediction of primary CPT codes (CPT 88302, 88304, 88305, 88307, 88309) as well as 38 ancillary CPT codes, using both the diagnostic text alone and text from all subfields. We performed similar analyses for characterizing text from a group of the 20 pathologists with the most pathology report sign-outs. Finally, we uncovered important report subcomponents by using model explanation techniques. Results: We identified 20 topics that pertained to diagnostic and procedural information. Operating on diagnostic text alone, BERT outperformed XGBoost for the prediction of primary CPT codes. When utilizing all report subfields, XGBoost outperformed BERT for the prediction of primary CPT codes. Utilizing additional subfields of the pathology report increased prediction accuracy across ancillary CPT codes, and performance gains for using additional report subfields were high for the XGBoost model for primary CPT codes. Misclassifications of CPT codes were between codes of a similar complexity, and misclassifications between pathologists were subspecialty related. Conclusions: Our approach generated CPT code predictions with an accuracy that was higher than previously reported. Although diagnostic text is an important source of information, additional insights may be extracted from other report subfields. Although BERT approaches performed comparably to the XGBoost approaches, they may lend valuable information to pipelines that combine image, text, and -omics information. Future resource-saving opportunities exist to help hospitals detect mis-billing, standardize report text, and estimate productivity metrics that pertain to pathologist c...

show abstract

“…In formula (10), p 1 and p 2 represent the conditional probability of intention label and slot label; M v 2 represents the probability vector of softmax transport slot label corresponding to each word; v represents the number of corresponding slot labels; M 1 represents the intention probability vector output by softmax. e probability of the corresponding category in each dimension and the maximum probability are taken as the intention category predicted by the sample [33]. Intent recognition and slot filling share the same encoder.…”

Section: Establishing Spoken Language Understanding Modelmentioning

confidence: 99%

Research on Spoken Language Understanding Based on Deep Learning

Yanli

2021

Scientific Programming

View full text Add to dashboard Cite

Aiming at solving the problem that the recognition effect of rare slot values in spoken language is poor, which affects the accuracy of oral understanding task, a spoken language understanding method is designed based on deep learning. The local features of semantic text are extracted and classified to make the classification results match the dialogue task. An intention recognition algorithm is designed for the classification results. Each datum has a corresponding intention label to complete the task of semantic slot filling. The attention mechanism is applied to the recognition of rare slot value information, the weight of hidden state and corresponding slot characteristics are obtained, and the updated slot value is used to represent the tracking state. An auxiliary gate unit is constructed between the upper and lower slots of historical dialogue, and the word vector is trained based on deep learning to complete the task of spoken language understanding. The simulation results show that the proposed method can realize multiple rounds of man-machine spoken language. Compared with the spoken language understanding methods based on cyclic network, context information, and label decomposition, it has higher accuracy and F1 value and has higher practical application value.

show abstract

Validation of deep learning natural language processing algorithm for keyword extraction from pathology reports in electronic health records

Cited by 36 publications

References 26 publications

Epilepsy Radiology Reports Classification Using Deep Learning Networks

Epilepsy Radiology Reports Classification Using Deep Learning Networks

Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports

Research on Spoken Language Understanding Based on Deep Learning

Contact Info

Product

Resources

About