DeepPPPred: An Ensemble of BERT, CNN, and RNN for Classifying Co-mentions of Proteins and Phenotypes

Shahri, Morteza Pourreza; Lyon, Katrina; Schearer, Julia; Kahanda, Indika

doi:10.1101/2020.09.18.304329

Cited by 5 publications

(2 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The combination model claims 98% accuracy in patient criteria matching. DeepPPPred [ 38 ], which is an ensemble classifier employing three versions of deep neural networks (recurrent neural networks (RNN), CNN, and BERT), outperforms its constituent individual neural networks. However, the COMPOSE model is for patient-trial matching and not patient similarity matching, whereas DeepPPPred is for protein classification.…”

Section: Related Workmentioning

confidence: 99%

A Novel Patient Similarity Network (PSN) Framework Based on Multi-Model Deep Learning for Precision Medicine

Navaz

Kassabi

Serhani

2022

JPM

View full text Add to dashboard Cite

Precision medicine can be defined as the comparison of a new patient with existing patients that have similar characteristics and can be referred to as patient similarity. Several deep learning models have been used to build and apply patient similarity networks (PSNs). However, the challenges related to data heterogeneity and dimensionality make it difficult to use a single model to reduce data dimensionality and capture the features of diverse data types. In this paper, we propose a multi-model PSN that considers heterogeneous static and dynamic data. The combination of deep learning models and PSN allows ample clinical evidence and information extraction against which similar patients can be compared. We use the bidirectional encoder representations from transformers (BERT) to analyze the contextual data and generate word embedding, where semantic features are captured using a convolutional neural network (CNN). Dynamic data are analyzed using a long-short-term-memory (LSTM)-based autoencoder, which reduces data dimensionality and preserves the temporal features of the data. We propose a data fusion approach combining temporal and clinical narrative data to estimate patient similarity. The experiments we conducted proved that our model provides a higher classification accuracy in determining various patient health outcomes when compared with other traditional classification algorithms.

show abstract

Section: Related Workmentioning

confidence: 99%

A Novel Patient Similarity Network (PSN) Framework Based on Multi-Model Deep Learning for Precision Medicine

Navaz

Kassabi

Serhani

2022

JPM

View full text Add to dashboard Cite

show abstract

“…Opinion mining is a highly discussed topic as there is a lot of unstructured data that makes developers and researchers do their own research to receive helpful insights. A text analysis of opinions can assist in understanding how people feel about different topics and events through opinions mining [5][6][7][8][9][10][11]. During the COVID-19 epidemic, several methods have been proposed to understand public attitudes and behaviors in the face of the pandemic [10,[12][13][14][15][16][17].…”

Section: Introductionmentioning

confidence: 99%

Spatial Impressions Monitoring during COVID-19 Pandemic Using Machine Learning Techniques

et al. 2022

View full text Add to dashboard Cite

During the COVID-19 epidemic, Twitter has become a vital platform for people to express their impressions and feelings towards the COVID-19 epidemic. There is an unavoidable need to examine various patterns on social media platforms in order to reduce public anxiety and misconceptions. Based on this study, various public service messages can be disseminated, and necessary steps can be taken to manage the scourge. There has already been a lot of work conducted in several languages, but little has been conducted on Arabic tweets. The primary goal of this study is to analyze Arabic tweets about COVID-19 and extract people’s impressions of different locations. This analysis will provide some insights into understanding public mood variation on Twitter, which could be useful for governments to identify the effect of COVID-19 over space and make decisions based on that understanding. To achieve that, two strategies are used to analyze people’s impressions from Twitter: machine learning approach and the deep learning approach. To conduct this study, we scraped Arabic tweets up with 12,000 tweets that were manually labeled and classify them as positive, neutral or negative feelings. Specialising in Saudi Arabia, the collected dataset consists of 2174 positive tweets and 2879 negative tweets. First, TF-IDF feature vectors are used for feature representation. Then, several models are implemented to identify people’s impression over time using Twitter Geo-tag information. Finally, Geographic Information Systems (GIS) are used to map the spatial distribution of people’s emotions and impressions. Experimental results show that SVC outperforms other methods in terms of performance and accuracy.

show abstract

Deep semi-supervised learning ensemble framework for classifying co-mentions of human proteins and phenotypes

Shahri

Kahanda

2021

BMC Bioinformatics

Self Cite

View full text Add to dashboard Cite

Background Identifying human protein-phenotype relationships has attracted researchers in bioinformatics and biomedical natural language processing due to its importance in uncovering rare and complex diseases. Since experimental validation of protein-phenotype associations is prohibitive, automated tools capable of accurately extracting these associations from the biomedical text are in high demand. However, while the manual annotation of protein-phenotype co-mentions required for training such models is highly resource-consuming, extracting millions of unlabeled co-mentions is straightforward. Results In this study, we propose a novel deep semi-supervised ensemble framework that combines deep neural networks, semi-supervised, and ensemble learning for classifying human protein-phenotype co-mentions with the help of unlabeled data. This framework allows the ability to incorporate an extensive collection of unlabeled sentence-level co-mentions of human proteins and phenotypes with a small labeled dataset to enhance overall performance. We develop PPPredSS, a prototype of our proposed semi-supervised framework that combines sophisticated language models, convolutional networks, and recurrent networks. Our experimental results demonstrate that the proposed approach provides a new state-of-the-art performance in classifying human protein-phenotype co-mentions by outperforming other supervised and semi-supervised counterparts. Furthermore, we highlight the utility of PPPredSS in powering a curation assistant system through case studies involving a group of biologists. Conclusions This article presents a novel approach for human protein-phenotype co-mention classification based on deep, semi-supervised, and ensemble learning. The insights and findings from this work have implications for biomedical researchers, biocurators, and the text mining community working on biomedical relationship extraction.

show abstract

DeepPPPred: An Ensemble of BERT, CNN, and RNN for Classifying Co-mentions of Proteins and Phenotypes

Cited by 5 publications

References 42 publications

A Novel Patient Similarity Network (PSN) Framework Based on Multi-Model Deep Learning for Precision Medicine

A Novel Patient Similarity Network (PSN) Framework Based on Multi-Model Deep Learning for Precision Medicine

Spatial Impressions Monitoring during COVID-19 Pandemic Using Machine Learning Techniques

Deep semi-supervised learning ensemble framework for classifying co-mentions of human proteins and phenotypes

Contact Info

Product

Resources

About