NoFold: RNA structure clustering without folding or alignment

Middleton, Sarah A.; Kim, Junhyong

doi:10.1261/rna.041913.113

“…we clustered stress-dependent P-body mRNAs based on secondary structures within the 3'UTR 274 using NoFold (Middleton and Kim, 2014). In comparison to non-candidate mRNAs, each stress-275 specific candidate set contained 10-20 clusters of transcripts that were differentially enriched in 276 certain structure motifs (Table S2) loop structures may favor P-body localization under stress.…”

Section: Puf5p Contributes To Both Recruitment and Decay Of P-body Mrmentioning

confidence: 99%

Context-dependent deposition and regulation of mRNAs in P-bodies

Wang

¹

,

Schmich

²

,

Weidner

³

et al. 2017

Preprint

View full text Add to dashboard Cite

24 25Cells respond to stress by remodeling their transcriptome through transcription and degradation. 26Xrn1p-dependent degradation in P-bodies is the most prevalent pathway. Yet, P-bodies may 27 facilitate not only decay but also act as storage compartment. However, which and how mRNAs 28 are selected into different degradation pathways and what determines the fate of any given mRNA 29 in P-bodies remain largely unknown. We devised a new method to identify both common and 30 stress-specific mRNA subsets associated with P-bodies. mRNAs targeted for degradation to P-31 bodies, decayed with different kinetics. Moreover, the localization of a specific set of mRNAs to 32

show abstract

“…We have previously applied this idea to RNA secondary structure analysis14, and we show here that it can be adapted to proteins. The objects being compared are amino-acid sequences and the distance we would like to compute is similarity of tertiary structure.…”

Section: Resultsmentioning

confidence: 99%

Complete fold annotation of the human proteome using a novel structural feature space

Middleton

¹

,

Illuminati

²

,

Kim

³

2017

Sci Rep

Self Cite

View full text Add to dashboard Cite

Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.

show abstract

“…Our approach is based on the idea of an empirical kernel 13 , where the distance between two objects is computed by comparing each object to a set of empirical examples or models. We have previously applied this idea to RNA secondary structure analysis 14 , and we show here that it can be adapted to proteins. The objects being compared are amino-acid sequences and the distance we would like to compute is similarity of tertiary structure.…”

Section: The Protein Empirical Structure Space (Pess)mentioning

confidence: 99%

Complete fold annotation of the human proteome using a novel structural feature space

Middleton

¹

,

Illuminati

²

,

Kim

³

2016

Preprint

Self Cite

0

View full text Add to dashboard Cite

Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.Although protein sequences can theoretically form a vast range of structures, the number of distinct three-dimensional topologies ("folds") actually observed in nature appears to be both finite and relatively small 1 : 1,221 folds are currently recognized in the SCOPe (Structural Classification of Proteins-extended) database 2 , and the rate of new fold discoveries has diminished greatly over the past two decades. Nevertheless, extending the catalog of protein fold diversity is still an important problem and fold classifying the entire proteome of an organism can lead to important insights about protein function [3][4][5] . Large-scale fold prediction typically involves computational methods, and the computational difficulty of ab initio structure prediction has led to template matching (e.g., using methods such as HHPred 6 ) as the most common method for predicting the structure. When sequence-based matching is difficult, other fold recognition approaches must be employed, such as protein threading. Threading-based methods, especially those that combine information from multiple templates, have been among the most successful algorithms in recent competitions for fold prediction 7,8 , but are bottlenecked by long run times. Machine learning-based methods have also been used, which can be designed either to recognize pairs of proteins with the same fold 9,10 or classify a protein into a fold 11,12 . Although these methods have shown promising results for a subset of folds, they have so far not been able to generalize to the full-scale fold recognition problem. This failure can mainly be attributed to the severe lack of training data available for most SCOPe folds, as well as the highly multi-class nature of the full problem, which requires distinguishing between over 1,000 different folds 12 . Here we introduce a method for full-scale fold recognition that integrates aspects of both threading and machine learning. At the core of our method is a novel feature space constructed by threading protein sequences against a relatively small set of structure templates. These templates...

show abstract

NoFold: RNA structure clustering without folding or alignment

Cited by 17 publications

References 45 publications

Context-dependent deposition and regulation of mRNAs in P-bodies

Context-dependent deposition and regulation of mRNAs in P-bodies

Complete fold annotation of the human proteome using a novel structural feature space

Complete fold annotation of the human proteome using a novel structural feature space

Contact Info

Product

Resources

About