Motivation
Protein–protein interactions (PPIs) play important roles in many biological processes. Conventional biological experiments for identifying PPI sites are costly and time-consuming. Thus, many computational approaches have been proposed to predict PPI sites. Existing computational methods usually use local contextual features to predict PPI sites. Actually, global features of protein sequences are critical for PPI site prediction.
Results
A new end-to-end deep learning framework, named DeepPPISP, through combining local contextual and global sequence features, is proposed for PPI site prediction. For local contextual features, we use a sliding window to capture features of neighbors of a target amino acid as in previous studies. For global sequence features, a text convolutional neural network is applied to extract features from the whole protein sequence. Then the local contextual and global sequence features are combined to predict PPI sites. By integrating local contextual and global sequence features, DeepPPISP achieves the state-of-the-art performance, which is better than the other competing methods. In order to investigate if global sequence features are helpful in our deep learning model, we remove or change some components in DeepPPISP. Detailed analyses show that global sequence features play important roles in DeepPPISP.
Availability and implementation
The DeepPPISP web server is available at http://bioinformatics.csu.edu.cn/PPISP/. The source code can be obtained from https://github.com/CSUBioGroup/DeepPPISP.
Supplementary information
Supplementary data are available at Bioinformatics online.
Genes that are thought to be critical for the survival of organisms or cells are called essential genes. The prediction of essential genes and their products (essential proteins) is of great value in exploring the mechanism of complex diseases, the study of the minimal required genome for living cells and the development of new drug targets. As laboratory methods are often complicated, costly and time-consuming, a great many of computational methods have been proposed to identify essential genes/proteins from the perspective of the network level with the in-depth understanding of network biology and the rapid development of biotechnologies. Through analyzing the topological characteristics of essential genes/proteins in protein–protein interaction networks (PINs), integrating biological information and considering the dynamic features of PINs, network-based methods have been proved to be effective in the identification of essential genes/proteins. In this paper, we survey the advanced methods for network-based prediction of essential genes/proteins and present the challenges and directions for future research.
ICD-9 (the Ninth Revision of International Classification of Diseases) is widely used to describe a patient's diagnosis. Accurate automated ICD-9 coding is important because manual coding is expensive, time-consuming and inefficient. Inspired by the recent successes of deep learning, in this study, we present a deep learning framework called DeepLabeler to automatically assign ICD-9 codes. DeepLabeler combines the convolutional neural network with the 'Document to Vector' technique to extract and encode local and global features. Our proposed DeepLabeler demonstrates its effectiveness by achieving state-of-the-art performance, i.e., 0.335 micro F-measure on MIMIC-II dataset and 0.408 micro F-measure on MIMIC-III dataset. It outperforms classical hierarchy-based SVM and flat-SVM both on these two datasets by at least 14%. Furthermore, we analyze the deep neural network structure to discover the vital elements in the success of DeepLabeler. We find that the convolutional neural network is the most effective component in our network and the 'Document to Vector' technique is also necessary for enhancing classification performance since it extracts well-recognized global features. Extensive experimental results demonstrate that the great promise of deep learning techniques in the field of text multi-label classification and automated medical coding.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.