The accuracy of prediction methods based on power spectrum analysis depends on the threshold that is used to discriminate between protein coding and non-coding sequences in the genomes of eukaryotes. Because the structure of genes vary among different eukaryotes, it is difficult to determine the best prediction threshold for a eukaryote relying only on prior biological knowledge. To improve the accuracy of prediction methods based on power spectral analysis, we developed a novel method based on a bootstrap algorithm to infer organism-specific optimal thresholds for eukaryotes. As prior information, our method requires the input of only a few annotated protein coding regions from the organism being studied. Our results show that using the calculated optimal thresholds for our test datasets, the average prediction accuracy of our method is 81%, an increase of 19% over that obtained using the same empirical threshold P=4 for all datasets. The proposed method is simple and convenient and easily applied to infer optimal thresholds that can be used to predict coding regions in the genomes of most organisms.
The accuracy of methods based on power spectrum analysis depends on the threshold that is used to discriminate the coding and non-coding sequences. Due to gene structural differences of different organisms, we inferred that there is an optimal gene prediction threshold for each organism. To prove this, we analyzed real biological data, and found that there are indeed different optimal thresholds for different organisms when the methods based on power spectrum analysis are used to predict genes.
Prediction of protein folding rate from amino acid sequences is one of the most important challenges in computational and molecular biology. Over the past few years, many methods have been developed to reflect the correlation between folding rates and protein structures and sequences. In this paper, based on the concept of Chou's pseudo-amino acid composition, we presents an effective method to predict protein folding rates from amino acid sequences, without any knowledge of the tertiary or secondary structures, or structural class information. The originality of the work presented in this paper is that it tackles the effect of sequence order information. The proposed method provides a good correlation between predicted and experimental folding rates, which equal 0.67 for 76 proteins when evaluated with leave-one-out jackknife test. The comparative results demonstrate that our approach is better than most of other methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.