Assessment of Medical Reports Uncertainty through Topic Modeling and Machine Learning

Shang, Mengyuan; Furst, Jacob; Stan-Raicu, Daniela

doi:10.1109/cbms49503.2020.00043

Cited by 3 publications

(1 citation statement)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The LDA method finds important clinical problems and formats clinical text reports in another investigation [15]. In other work, [16] used topic modeling to express scientific reports efficiently, allowing the analysis of the collections more quickly. Probabilistic-based topic modeling is applied to find the basic topics of the biomedical text collection.…”

mentioning

confidence: 99%

A novel multiple kernel fuzzy topic modeling technique for biomedical data

et al. 2022

View full text Add to dashboard Cite

Background Text mining in the biomedical field has received much attention and regarded as the important research area since a lot of biomedical data is in text format. Topic modeling is one of the popular methods among text mining techniques used to discover hidden semantic structures, so called topics. However, discovering topics from biomedical data is a challenging task due to the sparsity, redundancy, and unstructured format. Methods In this paper, we proposed a novel multiple kernel fuzzy topic modeling (MKFTM) technique using fusion probabilistic inverse document frequency and multiple kernel fuzzy c-means clustering algorithm for biomedical text mining. In detail, the proposed fusion probabilistic inverse document frequency method is used to estimate the weights of global terms while MKFTM generates frequencies of local and global terms with bag-of-words. In addition, the principal component analysis is applied to eliminate higher-order negative effects for term weights. Results Extensive experiments are conducted on six biomedical datasets. MKFTM achieved the highest classification accuracy 99.04%, 99.62%, 99.69%, 99.61% in the Muchmore Springer dataset and 94.10%, 89.45%, 92.91%, 90.35% in the Ohsumed dataset. The CH index value of MKFTM is higher, which shows that its clustering performance is better than state-of-the-art topic models. Conclusion We have confirmed from results that proposed MKFTM approach is very efficient to handles to sparsity and redundancy problem in biomedical text documents. MKFTM discovers semantically relevant topics with high accuracy for biomedical documents. Its gives better results for classification and clustering in biomedical documents. MKFTM is a new approach to topic modeling, which has the flexibility to work with a variety of clustering methods.

show abstract

mentioning

confidence: 99%