Background Mutation of a single amino acid residue can cause changes in a protein, which could then lead to a loss of protein function. Predicting the protein stability changes can provide several possible candidates for the novel protein designing. Although many prediction tools are available, the conflicting prediction results from different tools could cause confusion to users. Results We proposed an integrated predictor, iStable, with grid computing architecture constructed by using sequence information and prediction results from different element predictors. In the learning model, several machine learning methods were evaluated and adopted the support vector machine as an integrator, while not just choosing the majority answer given by element predictors. Furthermore, the role of the sequence information played was analyzed in our model, and an 11-window size was determined. On the other hand, iStable is available with two different input types: structural and sequential. After training and cross-validation, iStable has better performance than all of the element predictors on several datasets. Under different classifications and conditions for validation, this study has also shown better overall performance in different types of secondary structures, relative solvent accessibility circumstances, protein memberships in different superfamilies, and experimental conditions. Conclusions The trained and validated version of iStable provides an accurate approach for prediction of protein stability changes. iStable is freely available online at: http://predictor.nchu.edu.tw/iStable.
Small interfering RNA (siRNA) has been used widely to induce gene silencing in cells. To predict the efficacy of an siRNA with respect to inhibition of its target mRNA, we developed a two layer system, siPRED, which is based on various characteristic methods in the first layer and fusion mechanisms in the second layer. Characteristic methods were constructed by support vector regression from three categories of characteristics, namely sequence, features, and rules. Fusion mechanisms considered combinations of characteristic methods in different categories and were implemented by support vector regression and neural networks to yield integrated methods. In siPRED, the prediction of siRNA efficacy through integrated methods was better than through any method that utilized only a single method. Moreover, the weighting of each characteristic method in the context of integrated methods was established by genetic algorithms so that the effect of each characteristic method could be revealed. Using a validation dataset, siPRED performed better than other predictive systems that used the scoring method, neural networks, or linear regression. Finally, siPRED can be improved to achieve a correlation coefficient of 0.777 when the threshold of the whole stacking energy is ≥−34.6 kcal/mol. siPRED is freely available on the web at http://predictor.nchu.edu.tw/siPRED.
Oncidium 'Gower Ramsey' is a valuable and successful commercial orchid for the floriculture industry in Taiwan. However, no genome reference for entire sequences of the transcribed genes currently exists for Oncidium orchids, to facilitate the development of molecular biological studies and the breeding of these orchids. In this study, we generated Oncidium cDNA libraries for six different organs: leaves, pseudobulbs, young inflorescences, inflorescences, flower buds and mature flowers. We utilized 454-pyrosequencing technology to perform high-throughput deep sequencing of the Oncidium transcriptome, yielding >0.9 million reads with an average length of 328 bp, for a total of 301 million bases. De novo assembly of the sequences yielded 50,908 contig sequences with an average length of 493 bp from 796,463 reads and 120,219 singletons. The assembled sequences were annotated using BLAST, and a total of 12,757 and 13,931 unigene transcripts from the Arabidopsis and rice genomes were matched by TBLASTX, respectively. A Gene Ontology (GO) analysis of the annotated Oncidium contigs revealed that the majority of sequenced genes were associated with 'unknown molecular function', 'cellular process' and 'intracellular components'. Furthermore, a complete flowering-associated expressed sequence that included most of the genes in the photoperiod pathway and the 15 CONSTANS-LIKE (COL) homologs with the conserved CCT domain was obtained in this collection. These data revealed that the Oncidium expressed sequence tag (EST) database generated in this study has sufficient coverage to be used as a tool to investigate the flowering pathway and various other biological pathways in orchids. An OncidiumOrchidGenomeBase (OOGB) website has been constructed and is publicly available online (http://predictor.nchu.edu.tw/oogb/).
Most modern tools used to predict sites of small ubiquitin-like modifier (SUMO) binding (referred to as SUMOylation) use algorithms, chemical features of the protein, and consensus motifs. However, these tools rarely consider the influence of post-translational modification (PTM) information for other sites within the same protein on the accuracy of prediction results. This study applied the Random Forest machine learning method, as well as motif screening models and a feature selection combination mechanism, to develop a SUMOylation prediction system, referred to as SUMOgo. With regard to prediction method, PTM sites were coded as new functional features in addition to structural features, such as sequence-based binary coding, encoded chemical features of proteins, and encoded secondary structure information that is important for PTM. Twenty cycles of prediction were conducted with a 1:1 combination of positive test data and random negative data. Matthew’s correlation coefficient of SUMOgo reached 0.511, which is higher than that of current commonly used tools. This study further verified the important role of PTM in SUMOgo and includes a case study on CREB binding protein (CREBBP). The website for the final tool is http://predictor.nchu.edu.tw/SUMOgo.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.