2008
DOI: 10.1177/117693430800400001
|View full text |Cite
|
Sign up to set email alerts
|

A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment Scores

Abstract: Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. In the asymptotic limit of long sequences, the Karlin-Altschul model computes a P-value assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Using a simple approach combined with recent results in reliability theory, we demonstrate here that the Karlin-Altshul model can be derived with no refere… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
11
0

Year Published

2009
2009
2014
2014

Publication Types

Select...
8

Relationship

3
5

Authors

Journals

citations
Cited by 8 publications
(11 citation statements)
references
References 23 publications
0
11
0
Order By: Relevance
“…Considering that there seems to exist a general probability distribution class for sequence comparisons scores 919 and that it seems (not the distribution parameters but only the qualitative shape, ie, the class of the distribution) to be independent of the sequences, and hence from the time since their divergence, it is natural to search a time independent solution to the equation (2) with the natural boundary conditions { n (0) = 0, n (M) = 0 where M is the maximum of the genetic distance (here M = 1), and to verify with data if this solution can be applied to our problem. Remembering that solutions outside this interval will have no biological meaning, there is no need to impose n ( x ) = 0 for x ∉[0,1].…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Considering that there seems to exist a general probability distribution class for sequence comparisons scores 919 and that it seems (not the distribution parameters but only the qualitative shape, ie, the class of the distribution) to be independent of the sequences, and hence from the time since their divergence, it is natural to search a time independent solution to the equation (2) with the natural boundary conditions { n (0) = 0, n (M) = 0 where M is the maximum of the genetic distance (here M = 1), and to verify with data if this solution can be applied to our problem. Remembering that solutions outside this interval will have no biological meaning, there is no need to impose n ( x ) = 0 for x ∉[0,1].…”
Section: Resultsmentioning
confidence: 99%
“…Then, the Gumbel distribution parameters λ and k of aligned sequence scores finds a theoretical rationale. The first, λ , is the Hazard Rate of the distribution of scores between residues19 and the second, k , is the probability that two aligned residues do not lose bits of information (ie, conserve an initial pairing score) when a mutation occurs 18. This result also suggests that alignment score distributions could result from a purely evolutionary process.…”
Section: Introductionmentioning
confidence: 99%
“…It has been recently demonstrated that the high-dimensionality of biological sequences leads to emergent computational features like similarity measure and particular probability distribution of this similarity 2123. Distance methods are the first approach for phylogeny reconstructions and consist merely in considering the data as a point in an η -dimensional phase space (where η is in first approximation the length of the sequence).…”
Section: Introductionmentioning
confidence: 99%
“…Nevertheless, the Z-values can be made very useful for computing accurate p-values via a "change of variable" technique [ 11 ]. More specifically, it has been shown that if the raw alignment scores follow a standard Gumbel law, then the p -values of associated Z-scores are free of sequence length and amino acid composition biases [ 12 , 13 ]. Since the only drawback of this approach is the computational expense associated with random simulations, it would be very interesting to see whether the "change of variable" approach can be used in other settings.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, an interesting approach to alignment score normalization has been described that uses so-called Shared Amount of Information (SAI) between the amino-acid[ 12 ]. The model proposed in [ 12 ] is unique since it is derived from the reliability theory applied to sequences of amino-acids.…”
Section: Introductionmentioning
confidence: 99%