Classification of Biomedical Texts for Cardiovascular Diseases with Deep Neural Network Using a Weighted Feature Representation Method

Ahmed, Nizar A.; Dilmaç, Fatih; Alpkoçak, Adil

doi:10.3390/healthcare8040392

Cited by 5 publications

(3 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With this method, we are able to retrieve the useful data from an unstructured log file. Many studies have used the BoW method [56][57][58][59], and in a recent study, combining BoW with word embeddings, Ahmed et al showed how to improve classification of biomedical texts [60].…”

Section: Bag Of Words (Bow)mentioning

confidence: 99%

A Novel Framework for Extracting Knowledge Management from Business Intelligence Log Files in Hospitals

Turkeli

Özaydın²

2022

Applied Sciences

View full text Add to dashboard Cite

This paper proposes a framework to extract knowledge-management elements from business systems in healthcare organizations. According to results of in-depth interviews with experts in the field, a framework is defined, and software was developed to generate log files. Following the application of the Bag of Words (BoW) method on log files of 455 days for feature extraction, the k-means algorithm was used to cluster the feature vectors. The framework was tested with queries for confirmation. The developed framework successfully clustered the generated reports at operational, tactical, and strategic levels to extract knowledge-management elements. This study provides evidence for the knowledge-management pyramid by finding that the generated reports are reviewed mostly at the operational level, then tactical, and then the least at the strategic level. Our framework has the potential to be used not only in the health sector, but also in banking, insurance, and other businesses using business intelligence, especially in accordance with the organization’s goals at operational, tactical, and strategic levels of the knowledge-management pyramid.

show abstract

Section: Bag Of Words (Bow)mentioning

confidence: 99%

A Novel Framework for Extracting Knowledge Management from Business Intelligence Log Files in Hospitals

Turkeli

Özaydın²

2022

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…Text classification tasks within the medical domain primarily benefited from domain-specific features, often generated via the utilization of knowledge sources such as the unified medical language system (UMLS) [ 9 ]. With the emergence of methods for generating effective numeric representations of texts or word embeddings (dense vectors), coupled with advances in computational capabilities, deep neural network based approaches became dominant in this space, obtaining SOTA performances in many text classification tasks [ 10 , 11 ]. Such approaches use dense vector representations, and generally require large volumes of annotated data.…”

Section: Introductionmentioning

confidence: 99%

Comparison of Pretraining Models and Strategies for Health-Related Social Media Text Classification

Guo

Yang

et al. 2022

Healthcare

View full text Add to dashboard Cite

Pretrained contextual language models proposed in the recent past have been reported to achieve state-of-the-art performances in many natural language processing (NLP) tasks, including those involving health-related social media data. We sought to evaluate the effectiveness of different pretrained transformer-based models for social media-based health-related text classification tasks. An additional objective was to explore and propose effective pretraining strategies to improve machine learning performance on such datasets and tasks. We benchmarked six transformer-based models that were pretrained with texts from different domains and sources—BERT, RoBERTa, BERTweet, TwitterBERT, BioClinical_BERT, and BioBERT—on 22 social media-based health-related text classification tasks. For the top-performing models, we explored the possibility of further boosting performance by comparing several pretraining strategies: domain-adaptive pretraining (DAPT), source-adaptive pretraining (SAPT), and a novel approach called topic specific pretraining (TSPT). We also attempted to interpret the impacts of distinct pretraining strategies by visualizing document-level embeddings at different stages of the training process. RoBERTa outperformed BERTweet on most tasks, and better than others. BERT, TwitterBERT, BioClinical_BERT and BioBERT consistently underperformed. For pretraining strategies, SAPT performed better or comparable to the off-the-shelf models, and significantly outperformed DAPT. SAPT+TSPT showed consistently high performance, with statistically significant improvement in three tasks. Our findings demonstrate that RoBERTa and BERTweet are excellent off-the-shelf models for health-related social media text classification, and extended pretraining using SAPT and TSPT can further improve performance.

show abstract

“…9 With the emergence of methods for generating effective numeric representations of texts or word embeddings (dense vectors), coupled with advances in computational capabilities, deep neural network based approaches became dominant in this space, obtaining SOTA performances in many text classification tasks. 10,11 Such approaches use dense vector representations, and generally require large volumes of annotated data. Word embedding generation approaches such as Word2Vec 12 and GLoVe 13 are capable of effectively capturing semantic representations of words/phrases ( ie ., text fragments with similar meanings appear close together in vector space), which n-gram based approaches were not capable of.…”

Section: Introductionmentioning

confidence: 99%

Comparison of pretraining models and strategies for health-related social media text classification

Guo

Yang

et al. 2021

Preprint

View full text Add to dashboard Cite

Motivation Pretrained contextual language models proposed in the recent past have been reported to achieve state-of-the-art performances in many natural language processing (NLP) tasks. There is a need to benchmark such models for targeted NLP tasks, and to explore effective pretraining strategies to improve machine learning performance. Results In this work, we addressed the task of health-related social media text classification. We benchmarked five models-RoBERTa, BERTweet, TwitterBERT, BioClinical_BERT, and BioBERT on 22 tasks. We attempted to boost performance for the best models by comparing distinct pretraining strategies-domain-adaptive pretraining (DAPT), source-adaptive pretraining (SAPT), and topic-specific pretraining (TSPT). RoBERTa and BERTweet performed comparably in most tasks, and better than others. For pretraining strategies, SAPT performed better or comparable to the off-the-shelf models, and significantly outperformed DAPT. SAPT+TSPT showed consistently high performance, with statistically significant improvement in one task. Our findings demonstrate that RoBERTa and BERTweet are excellent off-the-shelf models for health-related social media text classification, and extended pretraining using SAPT and TSPT can further improve performance.

show abstract

Classification of Biomedical Texts for Cardiovascular Diseases with Deep Neural Network Using a Weighted Feature Representation Method

Cited by 5 publications

References 23 publications

A Novel Framework for Extracting Knowledge Management from Business Intelligence Log Files in Hospitals

A Novel Framework for Extracting Knowledge Management from Business Intelligence Log Files in Hospitals

Comparison of Pretraining Models and Strategies for Health-Related Social Media Text Classification

Comparison of pretraining models and strategies for health-related social media text classification

Contact Info

Product

Resources

About