Predicting residue–residue contacts using random forest models

Li, Yunqi; Fang, Yaping; Fang, Jianwen

doi:10.1093/bioinformatics/btr579

Cited by 53 publications

(48 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…RF has been extensively used in bioinformatics applications, including the prediction of disease-causing mutations [8,47-49]. The popularity of RF is due in part to its simplicity with no fine-tuning of parameters required and in part to its speed of classification, which is often faster than an equivalent SVM model [50]. In this study, as we are combining multiple classification models and evaluating different training sets, this advantage of RF (limited tuning required) over SVM (tuning required) was considerable.…”

Section: Methodsmentioning

confidence: 99%

MutPred Splice: machine learning-based prediction of exonic variants that disrupt splicing

et al. 2014

View full text Add to dashboard Cite

We have developed a novel machine-learning approach, MutPred Splice, for the identification of coding region substitutions that disrupt pre-mRNA splicing. Applying MutPred Splice to human disease-causing exonic mutations suggests that 16% of mutations causing inherited disease and 10 to 14% of somatic mutations in cancer may disrupt pre-mRNA splicing. For inherited disease, the main mechanism responsible for the splicing defect is splice site loss, whereas for cancer the predominant mechanism of splicing disruption is predicted to be exon skipping via loss of exonic splicing enhancers or gain of exonic splicing silencer elements. MutPred Splice is available at http://mutdb.org/mutpredsplice.

show abstract

Section: Methodsmentioning

confidence: 99%

MutPred Splice: machine learning-based prediction of exonic variants that disrupt splicing

et al. 2014

View full text Add to dashboard Cite

show abstract

“…Our previous work5152 has indicated that random forest and Support Vector Machine (SVM) usually demonstrate good performance with various datasets. This finding is consistent with the recently published work of Fernandez-Delgado et al 53,.…”

Section: Methodsmentioning

confidence: 99%

“…In this work, random forest was applied because it is generally more robust than SVM, which is a parameter-sensitive method and requires a long period of time to optimize parameters. The random forest package in R software was used in this study, as in our previous study52. The ntree parameter was set to 5,000, which historically has demonstrated good performance5152, and the importance was set to TRUE.…”

Section: Methodsmentioning

confidence: 99%

“…The random forest package in R software was used in this study, as in our previous study52. The ntree parameter was set to 5,000, which historically has demonstrated good performance5152, and the importance was set to TRUE. To build a robust model, the Pareto optimization rule27 was applied, which favors a good model with better performance and fewer numbers of features.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

In silico identification of enhancers on the basis of a combination of transcription factor binding motif occurrences

Fang

Wang

Zhu

et al. 2016

Sci Rep

Self Cite

View full text Add to dashboard Cite

Enhancers interact with gene promoters and form chromatin looping structures that serve important functions in various biological processes, such as the regulation of gene transcription and cell differentiation. However, enhancers are difficult to identify because they generally do not have fixed positions or consensus sequence features, and biological experiments for enhancer identification are costly in terms of labor and expense. In this work, several models were built by using various sequence-based feature sets and their combinations for enhancer prediction. The selected features derived from a recursive feature elimination method showed that the model using a combination of 141 transcription factor binding motif occurrences from 1,422 transcription factor position weight matrices achieved a favorably high prediction accuracy superior to that of other reported methods. The models demonstrated good prediction accuracy for different enhancer datasets obtained from different cell lines/tissues. In addition, prediction accuracy was further improved by integration of chromatin state features. Our method is complementary to wet-lab experimental methods and provides an additional method to identify enhancers.

show abstract

“…Li et al [40] developed ProC_S3, based on a set of Random Forest algorithm based models using 1287 sequence-based features. Marks et al [43] use a global model of maximum entropy constrained by correlated mutations from multiple sequence alignments.…”

Section: Contact Map Predictionmentioning

confidence: 99%

Evolutionary decision rules for predicting protein contact maps

Chamorro

Asencio-Cortés

Divina

et al. 2012

Pattern Anal Applic

View full text Add to dashboard Cite

Protein structure prediction is currently one of the main open challenges in Bioinformatics. The protein contact map is an useful, and commonly used, representation for protein 3D structure and represents binary proximities (contact or non-contact) between each pair of amino acids of a protein. In this work, we propose a multiobjective evolutionary approach for contact map prediction based on physico-chemical properties of amino acids. The evolutionary algorithm produces a set of decision rules that identifies contacts between amino acids. The rules obtained by the algorithm impose a set of conditions based on amino acid properties to predict contacts. We present results obtained by our approach on four different protein data sets. A statistical study was also performed to extract valid conclusions from the set of prediction rules generated by our algorithm. Results obtained confirm the validity of our proposal.

show abstract

Predicting residue–residue contacts using random forest models

Cited by 53 publications

References 37 publications

MutPred Splice: machine learning-based prediction of exonic variants that disrupt splicing

MutPred Splice: machine learning-based prediction of exonic variants that disrupt splicing

In silico identification of enhancers on the basis of a combination of transcription factor binding motif occurrences

Evolutionary decision rules for predicting protein contact maps

Contact Info

Product

Resources

About