2017
DOI: 10.1038/srep46321
|View full text |Cite
|
Sign up to set email alerts
|

Complete fold annotation of the human proteome using a novel structural feature space

Abstract: Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
9
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
3
1

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(9 citation statements)
references
References 35 publications
0
9
0
Order By: Relevance
“…Full-length proteins were split into one or more predicted domains (where “domain” is defined as an amino acid chain that likely folds into a compact, independently stable tertiary structure; see “Methods”), yielding a total of 6845 domains. Each domain was classified into a SCOP structural fold using our PESS pipeline [30]. Using this approach, we were able to predict the fold of 2005 additional domains beyond previous structural annotation [31].…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Full-length proteins were split into one or more predicted domains (where “domain” is defined as an amino acid chain that likely folds into a compact, independently stable tertiary structure; see “Methods”), yielding a total of 6845 domains. Each domain was classified into a SCOP structural fold using our PESS pipeline [30]. Using this approach, we were able to predict the fold of 2005 additional domains beyond previous structural annotation [31].…”
Section: Resultsmentioning
confidence: 99%
“…If a “filled in” region such as this was longer than 450 aa, we used a sliding window of 300 aa (slide = 150 aa) to break it into smaller pieces, since domains are rarely larger than this. The fold of each domain was predicted using the method described in [30]. A nearest neighbor distance threshold of ≤ 17.5 was used to designate “high confidence” predictions, and a more lenient threshold of ≤ 30 was used to designate “medium confidence” predictions.…”
Section: Methodsmentioning
confidence: 99%
“…Full length proteins were split into one or more predicted domains (where “domain” is defined as an amino acid chain that likely folds into a compact, independently stable tertiary structure; see Methods), yielding a total of 6,845 domains. Each domain was classified into a SCOP structural fold using our PESS pipeline (Middleton, Illuminati, and Kim 2017). Using this approach, we were able to predict the fold of 2,005 additional domains beyond previous structural annotation (Lees et al 2012).…”
Section: Resultsmentioning
confidence: 99%
“…If a “filled in” region such as this was longer than 450 aa, we used a sliding window of 300 aa (slide = 150 aa) to break it into smaller pieces, since domains are rarely larger than this. The fold of each domain was predicted using the method described in (Middleton, Illuminati, and Kim 2017). A nearest neighbor distance threshold of ≤ 17.5 was used to designate “high confidence” predictions, and a more lenient threshold of ≤ 30 was used to designate “medium confidence” predictions.…”
Section: Methodsmentioning
confidence: 99%
“…Thus, these seven SUMO1 protein structures contain total of 17 folding conformations, and they are, respectively, converted into 17 of PFSC strings according their coordinates of alpha C‐atoms. The PFSC strings for fragment 21–44,50–89 for 17 folding conformations of SUMO1_HUMAN are aligned and listed in the top section of Table 5.…”
Section: Resultsmentioning
confidence: 99%