Identifying the interaction between drugs and target proteins is an important area of drug research, which provides a broad prospect for low-risk and faster drug development. However, due to the limitations of traditional experiments when revealing drug-protein interactions (DTIs), the screening of targets not only takes a lot of time and money but also has high false-positive and false-negative rates. Therefore, it is imperative to develop effective automatic computational methods to accurately predict DTIs in the postgenome era. In this article, we propose a new computational method for predicting DTIs from drug molecular structure and protein sequence by using the stacked autoencoder of deep learning, which can adequately extract the raw data information. The proposed method has the advantage that it can automatically mine the hidden information from protein sequences and generate highly representative features through iterations of multiple layers. The feature descriptors are then constructed by combining the molecular substructure fingerprint information, and fed into the rotation forest for accurate prediction. The experimental results of fivefold cross-validation indicate that the proposed method achieves superior performance on gold standard data sets (enzymes, ion channels, GPCRs [G-protein-coupled receptors], and nuclear receptors) with accuracy of 0.9414, 0.9116, 0.8669, and 0.8056, respectively. We further comprehensively explore the performance of the proposed method by comparing it with other feature extraction algorithms, state-of-the-art classifiers, and other excellent methods on the same data set. The excellent comparison results demonstrate that the proposed method is highly competitive when predicting drug-target interactions.
Protein is an essential component of the living organism. The prediction of protein-protein interactions (PPIs) has important implications for understanding the behavioral processes of life, preventing diseases, and developing new drugs. Although the development of high-throughput technology makes it possible to identify PPIs in large-scale biological experiments, it restricts the extensive use of experimental methods due to the constraints of time, cost, false positive rate and other conditions. Therefore, there is an urgent need for computational methods as a supplement to experimental methods to predict PPIs rapidly and accurately. In this paper, we propose a novel approach, namely CNN-FSRF, for predicting PPIs based on protein sequence by combining deep learning Convolution Neural Network (CNN) with Feature-Selective Rotation Forest (FSRF). The proposed method firstly converts the protein sequence into the Position-Specific Scoring Matrix (PSSM) containing biological evolution information, then uses CNN to objectively and efficiently extracts the deeply hidden features of the protein, and finally removes the redundant noise information by FSRF and gives the accurate prediction results. When performed on the PPIs datasets Yeast and Helicobacter pylori , CNN-FSRF achieved a prediction accuracy of 97.75% and 88.96%. To further evaluate the prediction performance, we compared CNN-FSRF with SVM and other existing methods. In addition, we also verified the performance of CNN-FSRF on independent datasets. Excellent experimental results indicate that CNN-FSRF can be used as a useful complement to biological experiments to identify protein interactions.
Identifying interactions among drug compounds and target proteins is the basis of drug research and plays a crucial role in drug discovery. However, determining drug–target interactions (DTIs) and potential protein–compound interactions by biological experiment‐based method alone is a very complicated, expensive, and time‐consuming process. Hence, there is an intense motivation to design in silico prediction methods to overcome these obstacles. In this work, we designed a novel in silico strategy to predict proteome‐scale DTIs based on the assumption that DTI pairs can be expressed through the evolutionary information derived from frequency profiles and drugs’ structural properties. To achieve this, drug molecules are encoded into the substructure fingerprints to represent certain fragments; target proteins are first converted into position‐specific scoring matrix (PSSM) and then encoded as 2‐dimensional principal component analysis (2DPCA) descriptors. In the prediction phase, the feature weighted rotation forest (RF) classifier is used to estimate whether drug and target interact with each other on four benchmark datasets, including Enzymes, Ion Channels, GPCRs, and Nuclear Receptors. The prediction accuracy of cross‐validation on the four datasets is 95.40%, 88.82%, 85.67%, and 82.22%, respectively. In order to have a clearer assessment of the proposed approach, we compared it with the discrete cosine transform (DCT) descriptor model, support vector machine (SVM) classifier model, and existing excellent approaches, including DBSI, NetCBP, KBMF2K, SIMCOMP, and RFDT. The excellent results of the experiment indicated that the proposed approach can effectively improve the DTI prediction accuracy and can be used as a practical tool for the research and design of new drugs.
The study of protein self-interactions (SIPs) can not only reveal the function of proteins at the molecular level, but is also crucial to understand activities such as growth, development, differentiation, and apoptosis, providing an important theoretical basis for exploring the mechanism of major diseases. With the rapid advances in biotechnology, a large number of SIPs have been discovered. However, due to the long period and high cost inherent to biological experiments, the gap between the identification of SIPs and the accumulation of data is growing. Therefore, fast and accurate computational methods are needed to effectively predict SIPs. In this study, we designed a new method, NLPEI, for predicting SIPs based on natural language understanding theory and evolutionary information. Specifically, we first understand the protein sequence as natural language and use natural language processing algorithms to extract its features. Then, we use the Position-Specific Scoring Matrix (PSSM) to represent the evolutionary information of the protein and extract its features through the Stacked Auto-Encoder (SAE) algorithm of deep learning. Finally, we fuse the natural language features of proteins with evolutionary features and make accurate predictions by Extreme Learning Machine (ELM) classifier. In the SIPs gold standard data sets of human and yeast, NLPEI achieved 94.19% and 91.29% prediction accuracy. Compared with different classifier models, different feature models, and other existing methods, NLPEI obtained the best results. These experimental results indicated that NLPEI is an effective tool for predicting SIPs and can provide reliable candidates for biological experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.