The need for larger-scale and increasingly complex protein-protein interaction (PPI) prediction tasks demands that state-of-the-art predictors be highly efficient and adapted to inter-and cross-species predictions. Furthermore, the ability to generate comprehensive interactomes has enabled the appraisal of each PPI in the context of all predictions leading to further improvements in classification performance in the face of extreme class imbalance using the Reciprocal Perspective (RP) framework. We here describe the PIPE4 algorithm. Adaptation of the PIPE3/MP-PIPE sequence preprocessing step led to upwards of 50x speedup and the new Similarity Weighted Score appropriately normalizes for window frequency when applied to any inter-and cross-species prediction schemas. Comprehensive interactomes for three prediction schemas are generated: (1) cross-species predictions, where Arabidopsis thaliana is used as a proxy to predict the comprehensive Glycine max interactome, (2) interspecies predictions between Homo sapiens-HIV1, and (3) a combined schema involving both cross-and inter-species predictions, where both Arabidopsis thaliana and Caenorhabditis elegans are used as proxy species to predict the interactome between Glycine max (the soybean legume) and Heterodera glycines (the soybean cyst nematode). Comparing PIPE4 with the state-of-the-art resulted in improved performance, indicative that it should be the method of choice for complex PPI prediction schemas. The elucidation of protein-protein interaction (PPI) networks is central to molecular biology research. Necessary to producing mechanistic models of cellular processes, PPI networks additionally contribute to challenges such as the prediction of gene function 1-3 , identification of disease genes 4 , and pharmaceutical discovery 5,6. Computational PPI prediction techniques have been developed to supplement and guide wet-laboratory experimental work. The last decade has seen increased computational demand in both scale and complexity of PPI predictors. Predicting comprehensive interactomes (the set of all possible pairwise PPIs in or between proteomes) has only recently become possible with the advent of high-performance computing infrastructure and algorithmic optimizations. While methodologically diverse in their implementation, PPI prediction tools generally exploit information from the set of known PPIs (previously confirmed using classical wet-laboratory techniques) to determine whether any two query proteins will physically interact. The utility and scalability of any one method is subject to the information it leverages. Structure-based methods, at one extreme, require the three-dimensional (3D) characterization of each protein and therefore suffer from low coverage of the proteome. While useful to determining highly specific PPI networks, many methods require template-based modelling which tend to be computationally taxing 7-9. Furthermore, even with complete 3D structural information of each protein in an organism's proteome, the computational time comp...
The soybean crop, Glycine max (L.) Merr., is consumed by humans, Homo sapiens, worldwide. While the respective bodies of literature and -omics data for each of these organisms are extensive, comparatively few studies investigate the molecular biological processes occurring between the two. We are interested in elucidating the network of protein–protein interactions (PPIs) involved in human–soybean allergies. To this end, we leverage state-of-the-art sequence-based PPI predictors amenable to predicting the enormous comprehensive interactome between human and soybean. A network-based analytical approach is proposed, leveraging similar interaction profiles to identify candidate allergens and proteins involved in the allergy response. Interestingly, the predicted interactome can be explored from two complementary perspectives: which soybean proteins are predicted to interact with specific human proteins and which human proteins are predicted to interact with specific soybean proteins. A total of eight proteins (six specific to the human proteome and two to the soy proteome) have been identified and supported by the literature to be involved in human health, specifically related to immunological and neurological pathways. This study, beyond generating the most comprehensive human–soybean interactome to date, elucidated a soybean seed interactome and identified several proteins putatively consequential to human health.
This thesis explores issues arising when one attempts to predict protein-protein interactions (PPI) involving multiple species using the Protein-protein Interaction Prediction Engine (PIPE) method. In cross-species predictions, where one predicts PPI in a target species given known PPI in a different training species, we showed that prediction performance is inversely correlated to the evolutionary distance between training and target species. With a change in the score calculation, we improved the area under the precision-recall curve by 45% when using seven well-studied species to predict an eighth.In inter-species predictions, one attempts to predict interactions between proteins arising from two different species, such as a host and a pathogen. For the first time, we have shown that PIPE is able to predict such inter-species PPI by predicting 229 novel PPI between HIV and human at an estimated precision of 82% (100:1 class imbalance).Lastly, by modifying a main data structure of PIPE, we also improved the speed of the PIPE algorithm by a factor of 53x when predicting H. sapiens PPI. Using the methods developed in this thesis, we have predicted all possible PPI between soybean and the Soybean Cyst Nematode pathogen. Collaborators at Agriculture and Agri-Food Canada will be pursuing and validating these predictions as they seek to combat this costly pest.iii Acknowledgements I would like to thank my supervisor, James Green for his support, patience and guidance throughout this experience. I am grateful for the opportunity he provided me with to pursue this research and for all the knowledge he shared with me throughout this time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.