The rapid advances in proteomic analyses coupled with the completion of multiple genomes have led to an increased demand for determining protein functions. The first step is classification or prediction into families. A method was developed for the prediction of protein family based only on protein sequence using support vector machine (SVM) models. In these models, the amino acids were classified into three categories (apolar, polar, and charged). Consecutive fragments ranging from one to five were annotated by amino acid type to define the protein features of each protein. SVM models were constructed based on the protein features of a training set of proteins and then examined with an independent set of proteins. The approach was tested for 20 protein families from the iProClass database of Protein Information Resources (PIR). For two-class SVM models, an average prediction accuracy of 0.9985 was achieved, while for multi-class SVM models an accuracy of 0.9941 was achieved. This study demonstrates that SVM based methods can accurately recognize and predict the protein family to which a sequence belongs based solely on its primary amino acid sequence.
A method was proposed for estimating noise relative to signal in microarray data. A signal to noise index, SNI, was defined and used to measure the level of signal compared to the noise contained in two microarray data sets. Simulations were conducted to generate the quantitative relationship between the SNI and its measurement of relative noise. The method was applied to two well known microarray data sets. Relative noise was estimated for both data sets, and the results were consistent with the observations in the original papers, demonstrating the proposed method is reliable for estimating relative noise in microarray data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.