Accumulating studies have indicated that essential proteins play critical roles in numerous biological processes. With the rapid development of high-throughput technologies, a large number of Protein-Protein Interaction (PPI) data have been found in Saccharomyces cerevisiae, which facilitate the formation of PPI networks. Up to now, a series of computational methods for predicting essential proteins from PPI networks have been proposed successively. However, the prediction accuracy of these computational methods is still not quite satisfactory. In this paper, a novel prediction method called CVIM is proposed to infer potential essential proteins. In CVIM, original PPI networks will be first transferred into weighted PPI networks by implementing PCC (Pearson Correlation Coefficient) on protein gene expression data. And then, based on weighted PPI networks and information of orthologous proteins, some critical network topological features and protein functional features will be extracted for each protein in the weighted PPI network. Finally, based on these newly extracted topological and functional features of proteins, an iterative algorithm will be designed to predict essential proteins. In order to evaluate the identification performance of CVIM, we have compared CVIM with 13 kinds of state-of-the-art prediction methods. Experimental results show that CVIM can achieve prediction accuracies of 92%, 80% and 71% out of the top 1%, 5% and 10% candidate proteins separately, which significantly outperform the prediction accuracies achieved by those state-of-the-art prediction methods. We have demonstrated that the prediction accuracy of essential proteins can be effectively improved by integrating the functional and network topological characteristics of proteins, which means that the novel method CVIM may be an excellent addition to the protein researches in the future. INDEX TERMS Characteristic vector, orthologous proteins, essential proteins, weighted protein-protein interaction network, iteration method.
ince the outbreak of the novel coronavirus at the beginning of 2020, over 100,000 papers have been published related to COVID-19, with a substantial portion of them focusing on epidemiological models that describe the observed and unobserved dynamics, primarily in postdictive mode, although some models are also used for short-term forecasting. These models represent the lumped dynamics of a big city, a state or a country, but suffer from large uncertainties 1 , resulting primarily from the lack of identifiability as well as the noise in the available sparse data. This lack of identifiability is related to modeling assumptions (structural identifiability), data availability and the complex biological characteristics of virus transmission, which are largely unknown and hard to measure 2 and include various emerging mutations of the virus 3 . The true scientific challenge is to recognize the large limitations of these (potentially useful) models, identify the multiple sources of uncertainty and suggest flexible models that can deal with seasonal variation in susceptibility, time delays, noisy data, under-determined systems, non-Markovian behavior and inherent stochasticity 4 . In addition to seasonal variation in transmission, for example, due to weather or mobility, for a given model the data uncertainty is usually propagated into the model parameters, rendering them as random variables/processes with an underlying probability distribution. The uncertainties in the input parameters affect the model predictability adversely, leaving many of these models inadequate for any decision-making, as they lack robustness, which is a measure of the extent to which the forward solvers amplify uncertainties from the input to the output 5 . In general, quantification of parametric input uncertainty is only based on a single given model, hence ignoring the bigger source of uncertainty associated with the model structure. A clear example of uncertainty associated with several different models in analyzing and predicting the dynamics of this com-
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.