2007
DOI: 10.1016/j.future.2006.07.016
|View full text |Cite
|
Sign up to set email alerts
|

The configuration space of homologous proteins: A theoretical and practical framework to reduce the diversity of the protein sequence space after massive all-by-all sequence comparisons

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
8
0

Year Published

2008
2008
2009
2009

Publication Types

Select...
5
1

Relationship

3
3

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 59 publications
0
8
0
Order By: Relevance
“…This property is an advantage for the clustering of large databases of biological sequences, since the addition of new sequences does not necessarily requires the recalculation of previous alignments. This is why different clustering methods based on pairwise comparisons of proteins have been proposed, using either E-value (COG [6], TribeMCL [7], ProtoNet [8], ProtoMap [9], SIMAP [10], SYSTERS [11]) or Z-value statistics (Decrypthon [12], TeraProt [12], PhytoProt [13], CluSTr [14]). Recent use of PAB classification for an automatic inference of phylogeny includes OrthoMCL [15], based on pairwise BLAST comparisons and the computation of evolutionary distance based on E-value statistics (for review, [12]).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…This property is an advantage for the clustering of large databases of biological sequences, since the addition of new sequences does not necessarily requires the recalculation of previous alignments. This is why different clustering methods based on pairwise comparisons of proteins have been proposed, using either E-value (COG [6], TribeMCL [7], ProtoNet [8], ProtoMap [9], SIMAP [10], SYSTERS [11]) or Z-value statistics (Decrypthon [12], TeraProt [12], PhytoProt [13], CluSTr [14]). Recent use of PAB classification for an automatic inference of phylogeny includes OrthoMCL [15], based on pairwise BLAST comparisons and the computation of evolutionary distance based on E-value statistics (for review, [12]).…”
Section: Introductionmentioning
confidence: 99%
“…In previous case studies, TULIP trees where shown to be consistent with phylogenetic trees [23]. The higher accuracy of Z-value over E-value statistics has been discussed and tested [12,[23][24][25]. In particular, Z-value statistics are valid when comparing sequences of very different amino acid compositions, an interesting feature to help the analysis of compositionally biased sequences.…”
Section: Introductionmentioning
confidence: 99%
“…Automatic analysis of biological sequences is crucial for the treatment of massive genomic outputs. Our understanding of more than 90 % of protein sequences stored in public databases, deduced from automatic translation of gene sequences, will not result from direct experimentation, but from our ability to predict informative features using in silico workflows [ 1 , 2 ]. An underlying postulate is that the molecular sequences determined in biological individuals or species, which have evolved from a common ancestor sequence and are therefore homologous, have conserved enough of the original features to be similar.…”
Section: Introductionmentioning
confidence: 99%
“…However, such manual pretreatment cannot be undertaken for all known genes. Alternatively, high throughput molecular phylogenies can be derived from massive all-against-all comparisons, based on pairwise alignments [68]. The questions of the statistical accuracy and maintenance of high throughput phylogenetic reconstruction are critical when including compositionally atypical and high insert containing sequences.…”
Section: Introductionmentioning
confidence: 99%
“…As mentioned above, and discussed recently [68] for massive comparisons based on BlastP/ E-values , i.e . COG [71], Tribe [72], ProtoMap [73], ProtNet [74], SIMAP [75] and SYSTERS release 4 [76], there is no theoretical support to justify that an E-value table can be converted into a rigorous and stable metric.…”
Section: Introductionmentioning
confidence: 99%