The assessment of binding affinity between ligands and the target proteins plays an essential role in drug discovery and design process. As an alternative to widely used scoring approaches, machine learning methods have also been proposed for fast prediction of the binding affinity with promising results, but most of them were developed as all-purpose models despite of the specific functions of different protein families, since proteins from different function families always have different structures and physicochemical features. In this study, we proposed a random forest method to predict the protein-ligand binding affinity based on a comprehensive feature set covering protein sequence, binding pocket, ligand structure and intermolecular interaction. Feature processing and compression was respectively implemented for different protein family datasets, which indicates that different features contribute to different models, so individual representation for each protein family is necessary. Three family-specific models were constructed for three important protein target families of HIV-1 protease, trypsin and carbonic anhydrase respectively. As a comparison, two generic models including diverse protein families were also built. The evaluation results show that models on family-specific datasets have the superior performance to those on the generic datasets and the Pearson and Spearman correlation coefficients (R p and Rs) on the test sets are 0.740, 0.874, 0.735 and 0.697, 0.853, 0.723 for HIV-1 protease, trypsin and carbonic anhydrase respectively. Comparisons with the other methods further demonstrate that individual representation and model construction for each protein family is a more reasonable way in predicting the affinity of one particular protein family.
The prediction of drug-target interactions is a key step in the drug discovery process, which serves to identify new drugs or novel targets for existing drugs. However, experimental methods for predicting drug-target interactions are expensive and time-consuming. Therefore, the in silico prediction of drug-target interactions has recently attracted increasing attention. In this study, we propose an eigenvalue transformation technique and apply this technique to two representative algorithms, the Regularized Least Squares classifier (RLS) and the semi-supervised link prediction classifier (SLP), that have been used to predict drug-target interaction. The results of computational experiments with these techniques show that algorithms including eigenvalue transformation achieved better performance on drug-target interaction prediction than did the original algorithms. These findings show that eigenvalue transformation is an efficient technique for improving the performance of methods for predicting drug-target interactions. We further show that, in theory, eigenvalue transformation can be viewed as a feature transformation on the kernel matrix. Accordingly, although we only apply this technique to two algorithms in the current study, eigenvalue transformation also has the potential to be applied to other algorithms based on kernels.
Single-nucleotide polymorphisms (SNPs) are the most frequent form of genetic variations. Non-synonymous SNPs (nsSNPs) occurring in coding region result in single amino acid substitutions that associate with human hereditary diseases. Plenty of approaches were designed for distinguishing deleterious from neutral nsSNPs based on sequence level information. Novel in this work, combinations of protein-protein interaction (PPI) network topological features were introduced in predicting disease-related nsSNPs. Based on a dataset that was compiled from Swiss-Prot, a random forest model was constructed with an average accuracy value of 80.43% and an MCC value of 0.60 in a rigorous tenfold crossvalidation test. For an independent dataset, our model achieved an accuracy of 88.05% and an MCC of 0.67. Compared with previous studies, our approach presented superior prediction ability. Results showed that the incorporated PPI network topological features outperform conventional features. Our further analysis indicated that disease-related proteins are topologically different from other proteins. This study suggested that nsSNPs may share some topological information of proteins and the change of topological attributes could provide clues in illustrating functional shift due to nsSNPs.
Hepatocellular carcinoma (HCC) is currently still a major factor leading to death, lacking of reliable biomarkers. Therefore, deep understanding the pathogenesis for HCC is of great importance. The emergence of circular RNA (circRNA) provides a new way to study the pathogenesis of human disease. Here, we employed the prediction tool to identify circRNAs based on RNA-seq data. Then, to investigate the biological function of the circRNA, the candidate circRNAs were associated with the protein-coding genes (PCGs) by GREAT. We found significant candidate circRNAs expression alterations between normal and tumor samples. Additionally, the PCGs associated with these candidate circRNAs were also found have discriminative expression patterns between normal and tumor samples. The enrichment analysis illustrated that these PCGs were predominantly enriched for liver/cardiovascular-related diseases such as atherosclerosis, myocardial ischemia and coronary heart disease, and participated in various metabolic processes. Together, a further network analysis indicated that these PCGs play important roles in the regulatory and the PPI network. Finally, we built a classification model to distinguish normal and tumor samples by using candidate circRNAs and their associated genes, respectively. Both of them obtained satisfactory results (~ 0.99 of AUC for circRNA and PCG). Our findings suggested that the circRNA could be a critical factor in HCC, providing a useful resource to explore the pathogenesis of HCC.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.