2015
DOI: 10.1038/srep07972
|View full text |Cite
|
Sign up to set email alerts
|

Distinguishing Proteins From Arbitrary Amino Acid Sequences

Abstract: What kinds of amino acid sequences could possibly be protein sequences? From all existing databases that we can find, known proteins are only a small fraction of all possible combinations of amino acids. Beginning with Sanger's first detailed determination of a protein sequence in 1952, previous studies have focused on describing the structure of existing protein sequences in order to construct the protein universe. No one, however, has developed a criteria for determining whether an arbitrary amino acid seque… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 15 publications
0
4
0
Order By: Relevance
“…Notably, the traditional Natural Vector, a probabilistic approach, illustrates the 12-dimensional nucleotide distributions, including the counts, mean locations, and normalized central moments of each nucleotide. The Natural Vector method and its extended versions have been applied to many studies and achieve high accuracy in sequence classification and phylogeny [6] , [7] , [8] . Here we apply the Natural Vector method with high order central moments to construct the genome space and combine k-mer theory and Natural Vector to define the new metric.…”
Section: Introductionmentioning
confidence: 99%
“…Notably, the traditional Natural Vector, a probabilistic approach, illustrates the 12-dimensional nucleotide distributions, including the counts, mean locations, and normalized central moments of each nucleotide. The Natural Vector method and its extended versions have been applied to many studies and achieve high accuracy in sequence classification and phylogeny [6] , [7] , [8] . Here we apply the Natural Vector method with high order central moments to construct the genome space and combine k-mer theory and Natural Vector to define the new metric.…”
Section: Introductionmentioning
confidence: 99%
“…Motivated by the protein map, they also develop a novel method, with the name of protein space, to realize the nature of protein universe [24]. Their method is applied successfully in their following papers and proved effective [25,26]. He et al present a new way of generalized Chaos Game Representation (CGR) method to outline a dynamic 3D graphical representation [27] which is analogous to the original CGR method proposed by Jeffrey for graphical representation of DNA [3].…”
Section: Introductionmentioning
confidence: 99%
“…This group of proteins was well classified in this space (according to theoretical expectations), despite being difficult to classify with tree‐constructing methods. Residue sequences that fold to proteins seemed to occupy a specific part of this space, which led to a possible test of the potential for a given sequence to code for a functional protein (Yau et al ., 2015 ). The boundaries of the space occupied by proteins were found to be reasonably resistant to change with new protein discoveries.…”
Section: Networkmentioning
confidence: 99%