Development of novel computational approaches for modeling protein properties from their primary structure is the main goal in applied proteomics. In this work, we reported the extension of the autocorrelation vector formalism to amino acid sequences for encoding protein structural information with modeling purposes. Amino acid sequence autocorrelation (AASA) vectors were calculated by measuring the autocorrelations at sequence lags ranging from 1 to 15 on the protein primary structure of 48 amino acid/residue properties selected from the AAindex data base. A total of 720 AASA descriptors were tested for building predictive models of the change of thermal unfolding Gibbs free energy change (delta deltaG) of gene V protein upon mutation. In this sense, ensembles of Bayesian-regularized genetic neural networks (BRGNNs) were used for obtaining an optimum nonlinear model for the conformational stability. The ensemble predictor described about 88% and 66% variance of the data in training and test sets respectively. Furthermore, the optimum AASA vector subset not only helped to successfully model unfolding stability but also well distributed wild-type and gene V protein mutants on a stability self-organized map (SOM), when used for unsupervised training of competitive neurons.
7This paper presents a new algorithm that can be used to compute an approximation to the median of a set of strings. The approximate median is obtained through the successive improvements of a partial solution. The edit distance from the partial solution to all the strings in the set is computed in each iteration, thus accounting for the fre- Comparative experiments involving Freeman chain codes encoding 2D shapes and the Copenhagen chromosome database show that the quality of the approximate median string is similar to benchmark approaches but achieves a much faster convergence.
Development of novel computational approaches for modeling protein properties from their primary structure is a main goal in applied proteomics. In this work, we reported the extension of the autocorrelation vector formalism to amino acid sequences for encoding protein structural information with modeling purposes. Amino Acid Sequence Autocorrelation (AASA) vectors were calculated by measuring the autocorrelations at sequence lags ranging from 1 to 15 on the protein primary structure of 48 amino acid/residue properties selected from the AAindex database. A total of 720 AASA descriptors were tested for building predictive models of the thermal unfolding Gibbs free energy change of human lysozyme mutants. In this sense, ensembles of Bayesian-Regularized Genetic Neural Networks (BRGNNs) were used for obtaining an optimum nonlinear model for the conformational stability. The ensemble predictor described about 88% and 68% variance of the data in training and test sets, respectively. Furthermore, the optimum AASA vector subset was shown not only to successfully model unfolding thermal stability but also to distribute wild-type and mutant lysozymes on a stability Self-organized Map (SOM) when used for unsupervised training of competitive neurons.
ChemInform is a weekly Abstracting Service, delivering concise information at a glance that was extracted from about 200 leading journals. To access a ChemInform Abstract, please click on HTML or PDF.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.