2015
DOI: 10.1002/prot.24936
|View full text |Cite
|
Sign up to set email alerts
|

Amino acid alphabet reduction preserves fold information contained in contact interactions in proteins

Abstract: To reduce complexity, understand generalized rules of protein folding, and facilitate de novo protein design, the 20-letter amino acid alphabet is commonly reduced to a smaller alphabet by clustering amino acids based on some measure of similarity. In this work, we seek the optimal alphabet that preserves as much of the structural information found in long-range (contact) interactions among amino acids in natively-folded proteins. We employ the Information Maximization Device, based on information theory, to p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

5
32
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 29 publications
(37 citation statements)
references
References 61 publications
5
32
0
Order By: Relevance
“…To perform translated search, Kraken 2X first builds a database from a set of reference proteins in the same manner that Kraken 2 does for nucleotide sequences. The usual alphabet of 20 amino acids is reduced to 15 using the 15-character alphabet of Solis 31 ; we add a single additional value representing selenocysteine, pyrrolysine, and translation termination (stop codons). This gives us 16 characters in our reduced alphabet, allowing us to represent a character with 4 bits.…”
Section: Translated Searchmentioning
confidence: 99%
See 1 more Smart Citation
“…To perform translated search, Kraken 2X first builds a database from a set of reference proteins in the same manner that Kraken 2 does for nucleotide sequences. The usual alphabet of 20 amino acids is reduced to 15 using the 15-character alphabet of Solis 31 ; we add a single additional value representing selenocysteine, pyrrolysine, and translation termination (stop codons). This gives us 16 characters in our reduced alphabet, allowing us to represent a character with 4 bits.…”
Section: Translated Searchmentioning
confidence: 99%
“…For Kraken 2, we performed two parameter sweeps, with one focused on minimizer-based subsampling, and one focused on hash-based subsampling. The first parameter sweep looked at values for ℓ in the interval [25,31], values for k in the interval [ℓ, ℓ+10], and values for s in the interval [0, 7]; the second parameter sweep looked at values of ℓ in the interval [25,31], fixed k= ℓ, values for f in the set {0.125, 0.25, 0.5}, and values for s in the interval [0,7]. We also performed a third parameter sweep, focused on translated search (Kraken 2X), where we looked at values for ℓ in the interval [11,15], values for k in the interval [ℓ, ℓ+3], and values for s in the interval [0, 3].…”
Section: Parameter Sweepsmentioning
confidence: 99%
“…This idea is implicitly included in other simplifications. Practically, the sequence-structure mapping could be described with information gains comparing to the random sequence [103], the ability of family recognition [104], as well as the mutual information based on contact population [105], and so on. Due to the complexity of the concerned data, some optimization methods are employed to search the optimal groupings which Figure 5.…”
Section: Simplification Based On Features Of Protein Systemsmentioning
confidence: 99%
“…Similarities are observed for the groupings with various methods. [105,115] Meanwhile, together with the simplifications, some features may not be correctly described after simplifications. For example, experiments and simulations suggest that the HP grouping could not describe all the features of natural proteins [92,95] though it is a common feature of proteins.…”
Section: Minimal Groupings Of Protein Systemsmentioning
confidence: 99%
See 1 more Smart Citation