BACKGROUND Modern medicine generates unstructured data containing a large amount of information. Extracting useful knowledge from this data and making scientific decisions for diagnosing and treating diseases have become increasingly necessary. Unstructured data, such as in the Medical Information Mart for Intensive Care III (MIMIC-III) dataset, contain several ambiguous words demonstrating the subjectivity of doctors. These data can be used to further improve the accuracy of medical support system assessments. OBJECTIVE We propose using fuzzy c-means (FCM) method and Gauss membership to quantify the subjective words in the clinical medical dataset MIMIC-III. METHODS Using 381,091 radiology reports collected from MIMIC-III, we extracted words representing the subjective degree from the text and converted them into corresponding membership intervals based on the words. RESULTS Consequently, the words representing each degree of each disease had a range of corresponding values. Examples of membership medians were atelectasis (2.971), pneumonia (3.121), pneumothorax (2.899), pulmonary edema (3.051), and pulmonary embolus (2.435). These membership sections can determine the symptoms of each disease. CONCLUSIONS In this study, we used the FCM and Gaussian functions to extract words from the MIMIC-III, which represent a subjective degree and cannot be processed by a computer, and performed fuzzy processing on them. It was concluded that words representing the degree in an English interpreted report can be extracted and quantified. The use of these words in medical support systems may improve diagnostic accuracy.
BACKGROUND In medical text processing, the classification methods with high interpretability are primarily dictionary- or rule-based methods, requiring a lot of labor cost in maintenance. However, in recent years, methods of image classification with higher accuracy than that of humans, have been developed. OBJECTIVE This research attempts to apply the transfer learning of image classification methods to text classification by converting medical text into images for better performance. METHODS This method applies the word embedding method to processed text for feature extraction and then segments it to generate a grayscale image. Then pretrained deep learning method for transfer learning was applied to the grayscale image, and the accuracy of the image classifier was validated by comparing it with that of the deep network transfer learning. RESULTS The transfer learning using pretrained ResNet displayed better validation accuracy. The validation accuracies of ResNet-18, -34, and -50 were 85.7%, 93.0%, and 98.9%, respectively; whereas the accuracy of naïve Bayes was 92.4%, and the pretrained model converged faster than the non-pretrained model. CONCLUSIONS The feasibility of applying the deep learning method of image processing was demonstrated by converting the text format to the image format.
BACKGROUND In medical text processing, the classification methods with high interpretability are primarily dictionary- or rule-based methods, requiring a lot of labor cost in maintenance. However, in recent years, methods of image classification with higher accuracy than that of humans, have been developed. OBJECTIVE This research attempts to apply the transfer learning of image classification methods to text classification by converting medical text into images for better performance. METHODS This method applies the word embedding method to processed text for feature extraction and then segments it to generate a grayscale image. Then pretrained deep learning method for transfer learning was applied to the grayscale image, and the accuracy of the image classifier was validated by comparing it with that of the deep network transfer learning. RESULTS The transfer learning using pretrained ResNet displayed better validation accuracy. The validation accuracies of ResNet-18, -34, and -50 were 85.7%, 93.0%, and 98.9%, respectively; whereas the accuracy of naïve Bayes was 92.4%, and the pretrained model converged faster than the non-pretrained model. CONCLUSIONS The feasibility of applying the deep learning method of image processing was demonstrated by converting the text format to the image format.
BACKGROUND Named entity recognition (NER) plays an important role in extracting the features of descriptions for mining free-text radiology reports. However, the performance of existing NER tools is limited because the number of entities depends on its dictionary lookup. Especially, the recognition of compound terms is very complicated because there are a variety of patterns. OBJECTIVE The objective of the study is to develop and evaluate a NER tool concerned with compound terms using the RadLex for mining free-text radiology reports. METHODS We leveraged the clinical Text Analysis and Knowledge Extraction System (cTAKES) to develop customized pipelines using both RadLex and SentiWordNet (a general-purpose dictionary, GPD). We manually annotated 400 of radiology reports for compound terms (Cts) in noun phrases and used them as the gold standard for the performance evaluation (precision, recall, and F-measure). Additionally, we also created a compound-term-enhanced dictionary (CtED) by analyzing false negatives (FNs) and false positives (FPs), and applied it for another 100 radiology reports for validation. We also evaluated the stem terms of compound terms, through defining two measures: an occurrence ratio (OR) and a matching ratio (MR). RESULTS The F-measure of the cTAKES+RadLex+GPD was 32.2% (Precision 92.1%, Recall 19.6%) and that of combined the CtED was 67.1% (Precision 98.1%, Recall 51.0%). The OR indicated that stem terms of “effusion”, "node", "tube", and "disease" were used frequently, but it still lacks capturing Cts. The MR showed that 71.9% of stem terms matched with that of ontologies and RadLex improved about 22% of the MR from the cTAKES default dictionary. The OR and MR revealed that the characteristics of stem terms would have the potential to help generate synonymous phrases using ontologies. CONCLUSIONS We developed a RadLex-based customized pipeline for parsing radiology reports and demonstrated that CtED and stem term analysis has the potential to improve dictionary-based NER performance toward expanding vocabularies.
BACKGROUND Dictionary based named-entity recognition (NER) with standardized terminology in radiology reports has the advantage of expressing the association relationships between extracted compounds. However, it is not as accurate as the methods that implement machine learning. OBJECTIVE To improve the accuracy of terminology extraction in NER, we attempt to expand the terminology dictionary using Ontology RadLex, which is a representative standardized terminology in the field of radiology. While grasping the trend of the words appearing in radiology reports, terminologies that could not be recognized by RadLex were added to the dictionary of analysis tools, and further study was conducted on the accuracies of these terms. METHODS In this study, 163,201 items of findings and impressions in MIMIC-III were used to extract words for extending dictionaries using Word2Vec. The parameters of Word2Vec for lexicon expansion to obtain the most appropriate similar words are discussed in this paper. RESULTS The best synonym is obtained when the epoch number is 7 in the hierarchical softmax based skip-gram algorithm. CONCLUSIONS Using these parameters, we can construct a model, input modifiers of compound words, and append compound words to the dictionary according to the order of output cosine values.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.