Measuring semantic similarity of clinical trial outcomes using deep pre-trained language representations

Koroleva, Anna; Kamath, Sanjay; Paroubek, Patrick

doi:10.1016/j.yjbinx.2019.100058

Cited by 16 publications

(7 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The work presented in [ 20 ] focused on identifying the similarity between outcomes reported in the scientific literature. To do so, this team annotated outcomes in a corpus of texts about clinical trials from PubMed Central; these data were later used to train deep learning algorithms (BERT-based models, [ 21 ]) for automatic similarity assessment.…”

Section: Related Workmentioning

confidence: 99%

A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine

Llanos

Valverde-Mateos

Capllonch-Carrión³

et al. 2021

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

Background The large volume of medical literature makes it difficult for healthcare professionals to keep abreast of the latest studies that support Evidence-Based Medicine. Natural language processing enhances the access to relevant information, and gold standard corpora are required to improve systems. To contribute with a new dataset for this domain, we collected the Clinical Trials for Evidence-Based Medicine in Spanish (CT-EBM-SP) corpus. Methods We annotated 1200 texts about clinical trials with entities from the Unified Medical Language System semantic groups: anatomy (ANAT), pharmacological and chemical substances (CHEM), pathologies (DISO), and lab tests, diagnostic or therapeutic procedures (PROC). We doubly annotated 10% of the corpus and measured inter-annotator agreement (IAA) using F-measure. As use case, we run medical entity recognition experiments with neural network models. Results This resource contains 500 abstracts of journal articles about clinical trials and 700 announcements of trial protocols (292 173 tokens). We annotated 46 699 entities (13.98% are nested entities). Regarding IAA agreement, we obtained an average F-measure of 85.65% (±4.79, strict match) and 93.94% (±3.31, relaxed match). In the use case experiments, we achieved recognition results ranging from 80.28% (±00.99) to 86.74% (±00.19) of average F-measure. Conclusions Our results show that this resource is adequate for experiments with state-of-the-art approaches to biomedical named entity recognition. It is freely distributed at: http://www.lllf.uam.es/ESP/nlpmedterm_en.html. The methods are generalizable to other languages with similar available sources.

show abstract

Section: Related Workmentioning

confidence: 99%

A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine

Llanos

Valverde-Mateos

Capllonch-Carrión³

et al. 2021

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

show abstract

“…For example, the identification of functional links between proteins has been recently conducted by fine-tuning weights from BioBERT [44]. Besides, several research manuscripts have reported better outcomes when the BioBERT model is implemented [47][48][49][50] in the literature.…”

Section: Biobert Modelmentioning

confidence: 99%

A hybrid algorithm for clinical decision support in precision medicine based on machine learning

Zhang

Lin

2023

BMC Bioinformatics

View full text Add to dashboard Cite

Purpose The objective of the manuscript is to propose a hybrid algorithm combining the improved BM25 algorithm, k-means clustering, and BioBert model to better determine biomedical articles utilizing the PubMed database so, the number of retrieved biomedical articles whose content contains much similar information regarding a query of a specific disease could grow larger. Design/methodology/approach In the paper, a two-stage information retrieval method is proposed to conduct an improved Text-Rank algorithm. The first stage consists of employing the improved BM25 algorithm to assign scores to biomedical articles in the database and identify the 1000 publications with the highest scores. The second stage is composed of employing a method called a cluster-based abstract extraction to reduce the number of article abstracts to match the input constraints of the BioBert model, and then the BioBert-based document similarity matching method is utilized to obtain the most similar search outcomes between the document and the retrieved morphemes. To realize reproducibility, the written code is made available on https://github.com/zzc1991/TREC_Precision_Medicine_Track. Findings The experimental study is conducted based on the data sets of TREC2017 and TREC2018 to train the proposed model and the data of TREC2019 is used as a validation set confirming the effectiveness and practicability of the proposed algorithm that would be implemented for clinical decision support in precision medicine with a generalizability feature. Originality/value This research integrates multiple machine learning and text processing methods to devise a hybrid method applicable to domains of specific medical literature retrieval. The proposed algorithm provides a 3% increase of P@10 than that of the state-of-the-art algorithm in TREC 2019.

show abstract

“…Among five well-known methods, the BERT showed the best performances for normalization of procedure and diagnosis. In addition, in [57], the authors presented a BERT-based model to measure semantic similarity of clinical trial outcomes. Moreover, another text analysis approach for medical applications was proposed by Zhang et al [58] using BERT.…”

Section: Applications Of Bidirectional Encoder Representations From Transformers (Bert)mentioning

confidence: 99%

Social Media Toxicity Classification Using Deep Learning: Real-World Application UK Brexit

Fan

Dahou

et al. 2021

Electronics

View full text Add to dashboard Cite

Social media has become an essential facet of modern society, wherein people share their opinions on a wide variety of topics. Social media is quickly becoming indispensable for a majority of people, and many cases of social media addiction have been documented. Social media platforms such as Twitter have demonstrated over the years the value they provide, such as connecting people from all over the world with different backgrounds. However, they have also shown harmful side effects that can have serious consequences. One such harmful side effect of social media is the immense toxicity that can be found in various discussions. The word toxic has become synonymous with online hate speech, internet trolling, and sometimes outrage culture. In this study, we build an efficient model to detect and classify toxicity in social media from user-generated content using the Bidirectional Encoder Representations from Transformers (BERT). The BERT pre-trained model and three of its variants has been fine-tuned on a well-known labeled toxic comment dataset, Kaggle public dataset (Toxic Comment Classification Challenge). Moreover, we test the proposed models with two datasets collected from Twitter from two different periods to detect toxicity in user-generated content (tweets) using hashtages belonging to the UK Brexit. The results showed that the proposed model can efficiently classify and analyze toxic tweets.

show abstract

Measuring semantic similarity of clinical trial outcomes using deep pre-trained language representations

Cited by 16 publications

References 34 publications

A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine

A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine

A hybrid algorithm for clinical decision support in precision medicine based on machine learning

Social Media Toxicity Classification Using Deep Learning: Real-World Application UK Brexit

Contact Info

Product

Resources

About