The identification of essential proteins can help in understanding the minimum requirements for cell survival and development. Ever-increasing amounts of high-throughput data provide us with opportunities to detect essential proteins from protein interaction networks (PINs). Existing network-based approaches are limited by the poor quality of the underlying PIN data, which exhibits high rates of false positive and false negative results. To overcome this problem, researchers have focused on the prediction of essential proteins by combining PINs with other biological data, which has led to the emergence of various interactions between proteins. It remains challenging, however, to use aggregated multiplex interactions within a single analysis framework to identify essential proteins. In this study, we created a multiplex biological network (MON) by initially integrating PINs, protein domains, and gene expression profiles. Next, we proposed a new approach to discover essential proteins by extending the random walk with restart algorithm to the tensor, which provides a data model representation of the MON. In contrast to existing approaches, the proposed MON approach considers for the importance of nodes and the different types of interactions between proteins during the iteration. MON was implemented to identify essential proteins within two yeast PINs. Our comprehensive experimental results demonstrated that MON outperformed 11 other state-of-the-art approaches in terms of precision-recall curve, jackknife curve, and other criteria.
Background Accurate annotation of protein function is the key to understanding life at the molecular level and has great implications for biomedicine and pharmaceuticals. The rapid developments of high-throughput technologies have generated huge amounts of protein–protein interaction (PPI) data, which prompts the emergence of computational methods to determine protein function. Plagued by errors and noises hidden in PPI data, these computational methods have undertaken to focus on the prediction of functions by integrating the topology of protein interaction networks and multi-source biological data. Despite effective improvement of these computational methods, it is still challenging to build a suitable network model for integrating multiplex biological data. Results In this paper, we constructed a heterogeneous biological network by initially integrating original protein interaction networks, protein-domain association data and protein complexes. To prove the effectiveness of the heterogeneous biological network, we applied the propagation algorithm on this network, and proposed a novel iterative model, named Propagate on Heterogeneous Biological Networks (PHN) to score and rank functions in descending order from all functional partners, Finally, we picked out top L of these predicted functions as candidates to annotate the target protein. Our comprehensive experimental results demonstrated that PHN outperformed seven other competing approaches using cross-validation. Experimental results indicated that PHN performs significantly better than competing methods and improves the Area Under the Receiver-Operating Curve (AUROC) in Biological Process (BP), Molecular Function (MF) and Cellular Components (CC) by no less than 33%, 15% and 28%, respectively. Conclusions We demonstrated that integrating multi-source data into a heterogeneous biological network can preserve the complex relationship among multiplex biological data and improve the prediction accuracy of protein function by getting rid of the constraints of errors in PPI networks effectively. PHN, our proposed method, is effective for protein function prediction.
Background The accurate characterization of protein functions is critical to understanding life at the molecular level and has a huge impact on biomedicine and pharmaceuticals. Computationally predicting protein function has been studied in the past decades. Plagued by noise and errors in protein–protein interaction (PPI) networks, researchers have undertaken to focus on the fusion of multi-omics data in recent years. A data model that appropriately integrates network topologies with biological data and preserves their intrinsic characteristics is still a bottleneck and an aspirational goal for protein function prediction. Results In this paper, we propose the RWRT (Random Walks with Restart on Tensor) method to accomplish protein function prediction by applying bi-random walks on the tensor. RWRT firstly constructs a functional similarity tensor by combining protein interaction networks with multi-omics data derived from domain annotation and protein complex information. After this, RWRT extends the bi-random walks algorithm from a two-dimensional matrix to the tensor for scoring functional similarity between proteins. Finally, RWRT filters out possible pretenders based on the concept of cohesiveness coefficient and annotates target proteins with functions of the remaining functional partners. Experimental results indicate that RWRT performs significantly better than the state-of-the-art methods and improves the area under the receiver-operating curve (AUROC) by no less than 18%. Conclusions The functional similarity tensor offers us an alternative, in that it is a collection of networks sharing the same nodes; however, the edges belong to different categories or represent interactions of different nature. We demonstrate that the tensor-based random walk model can not only discover more partners with similar functions but also free from the constraints of errors in protein interaction networks effectively. We believe that the performance of function prediction depends greatly on whether we can extract and exploit proper functional similarity information on protein correlations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.