BackgroundIdentifying the interactions between proteins and long non-coding RNAs (lncRNAs) is of great importance to decipher the functional mechanisms of lncRNAs. However, current experimental techniques for detection of lncRNA-protein interactions are limited and inefficient. Many methods have been proposed to predict protein-lncRNA interactions, but few studies make use of the topological information of heterogenous biological networks associated with the lncRNAs.ResultsIn this work, we propose a novel approach, PLIPCOM, using two groups of network features to detect protein-lncRNA interactions. In particular, diffusion features and HeteSim features are extracted from protein-lncRNA heterogenous network, and then combined to build the prediction model using the Gradient Tree Boosting (GTB) algorithm. Our study highlights that the topological features of the heterogeneous network are crucial for predicting protein-lncRNA interactions. The cross-validation experiments on the benchmark dataset show that PLIPCOM method substantially outperformed previous state-of-the-art approaches in predicting protein-lncRNA interactions. We also prove the robustness of the proposed method on three unbalanced data sets. Moreover, our case studies demonstrate that our method is effective and reliable in predicting the interactions between lncRNAs and proteins.AvailabilityThe source code and supporting files are publicly available at: http://denglab.org/PLIPCOM/.
BackgroundRNA binding proteins play important roles in post-transcriptional RNA processing and transcriptional regulation. Distinguishing the RNA-binding residues in proteins is crucial for understanding how protein and RNA recognize each other and function together as a complex.ResultsWe propose PredRBR, an effectively computational approach to predict RNA-binding residues. PredRBR is built with gradient tree boosting and an optimal feature set selected from a large number of sequence and structure characteristics and two categories of structural neighborhood properties. In cross-validation experiments on the RBP170 data set show that PredRBR achieves an overall accuracy of 0.84, a sensitivity of 0.85, MCC of 0.55 and AUC of 0.92, which are significantly better than that of other widely used machine learning algorithms such as Support Vector Machine, Random Forest, and Adaboost. We further calculate the feature importance of different feature categories and find that structural neighborhood characteristics are critical in the recognization of RNA binding residues. Also, PredRBR yields significantly better prediction accuracy on an independent test set (RBP101) in comparison with other state-of-the-art methods.ConclusionsThe superior performance over existing RNA-binding residue prediction methods indicates the importance of the gradient tree boosting algorithm combined with the optimal selected features.
Protein–nucleic acid complexes play essential roles in regulating transcription, translation, DNA replication, repair and recombination, RNA processing and translocation. Site-directed mutagenesis has been extremely useful in understanding the principles of protein–DNA and protein–RNA interactions, and experimentally determined mutagenesis data are prerequisites for designing effective algorithms for predicting the binding affinity change upon mutation. However, a vital challenge in this area is the lack of sufficient public experimentally recognized mutation data, which leads to difficulties in developing computational prediction methods. In this article, we present Nabe, an integrated database of amino acid mutations and their effects on the binding free energy in protein–DNA and protein–RNA interactions for which binding affinities have been experimentally determined. Compared with existing databases and data sets, Nabe is the largest protein–nucleic acid mutation database, containing 2506 mutations in 473 protein–DNA and protein–RNA complexes, and of that 1751 are alanine mutations in 405 protein–nucleic acid complexes. For researchers to conveniently utilize the data, Nabe assembles protein–DNA and protein–RNA benchmark databases by adopting the data-processing procedures in the majority of models. To further facilitate users to query data, Nabe provides a searchable and graphical web page.
Database URL: http://nabe.denglab.org
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.