Protein-protein interactions (PPIs) carry out an extensive variety of biological procedures, containing cell-to-cell interactions, metabolic and developmental control. PPI is becoming one of the most important aims of system biology. PPI act as a fundamental part in predicting the protein function of the target protein and drug ability of molecules. Abundant work has been done to nurture methods to predict PPIs computationally as this supplements laboratory trials and offers cost-effective way of predicting the most likely set of interactions at the entire proteome scale. This article presents an innovative feature representation method (CAA-PPI) to extract features from protein sequence using two different encoding strategies and then an ensemble learning method, the Random Forest method is used as a classifier for PPIs prediction. CAA-PPI considers the role of trigram and bond of given amino acid with its nearby ones. The proposed PPI model achieves more than 98% prediction accuracies with one encoding scheme and more than 95% prediction accuracies with another encoding scheme respectively for the two diverse PPI datasets i.e. H. Pylori and Yeast. Further investigations are made to compare the CAA-PPI approach with existing sequence-based methods and reveals the proficiency of the proposed method with both encoding strategies. To further assess the practical prediction competence, a blind test has implemented on five other species’ datasets independent of the training set, and obtained result ascertains the productivity of CAA-PPI with both encoding schemes.
Protein–protein interactions (PPIs) are involved in an extensive variety of biological procedures, including cell-to-cell interactions, and metabolic and developmental control. PPIs are becoming one of the most important aims of system biology. PPIs act as a fundamental part in predicting the protein function of the target protein and the drug ability of molecules. An abundance of work has been performed to develop methods to computationally predict PPIs as this supplements laboratory trials and offers a cost-effective way of predicting the most likely set of interactions at the entire proteome scale. This article presents an innovative feature representation method (CAA-PPI) to extract features from protein sequences using two different encoding strategies followed by an ensemble learning method. The random forest methodwas used as a classifier for PPI prediction. CAA-PPI considers the role of the trigram and bond of a given amino acid with its nearby ones. The proposed PPI model achieved more than a 98% prediction accuracy with one encoding scheme and more than a 95% prediction accuracy with another encoding scheme for the two diverse PPI datasets, i.e., H. pylori and Yeast. Further, investigations were performed to compare the CAA-PPI approach with existing sequence-based methods and revealed the proficiency of the proposed method with both encoding strategies. To further assess the practical prediction competence, a blind test was implemented on five other species’ datasets independent of the training set, and the obtained results ascertained the productivity of CAA-PPI with both encoding schemes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.