2007
DOI: 10.1101/gr.6725608
|View full text |Cite
|
Sign up to set email alerts
|

Uncertainty in homology inferences: Assessing and improving genomic sequence alignment

Abstract: Sequence alignment underpins all of comparative genomics, yet it remains an incompletely solved problem. In particular, the statistical uncertainty within inferred alignments is often disregarded, while parametric or phylogenetic inferences are considered meaningless without confidence estimates. Here, we report on a theoretical and simulation study of pairwise alignments of genomic DNA at human-mouse divergence. We find that >15% of aligned bases are incorrect in existing whole-genome alignments, and we ident… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

6
176
0
1

Year Published

2009
2009
2019
2019

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 142 publications
(183 citation statements)
references
References 66 publications
6
176
0
1
Order By: Relevance
“…However, at present, these methods require orders of magnitude more computational time than methods that assumed fixed alignments and are not feasible for use on a genome-wide scale. Still, it may be possible to use heuristic methods to substantially improve the speed of such methods (Bradley et al 2009;Paten et al 2009), or to quantify alignment uncertainty and then use this information in downstream functional element identification (Lunter et al 2008). In short, many opportunities remain for improving the biological realism, statistical power, and robustness of methods for identifying functional elements from comparative sequence data.…”
Section: Detection Of Nonneutral Substitution Ratesmentioning
confidence: 99%
“…However, at present, these methods require orders of magnitude more computational time than methods that assumed fixed alignments and are not feasible for use on a genome-wide scale. Still, it may be possible to use heuristic methods to substantially improve the speed of such methods (Bradley et al 2009;Paten et al 2009), or to quantify alignment uncertainty and then use this information in downstream functional element identification (Lunter et al 2008). In short, many opportunities remain for improving the biological realism, statistical power, and robustness of methods for identifying functional elements from comparative sequence data.…”
Section: Detection Of Nonneutral Substitution Ratesmentioning
confidence: 99%
“…The gap can be formed as a manifestation of a high occurrence of indels (insertion and deletion). The occurrence of indels in bacteria can be caused by mutation, genetic recombination and transformation [22].…”
Section: Resultsmentioning
confidence: 99%
“…Based on these Forward calculations, we run the Backward algorithm and calculate posterior probabilities for every pair of residues in each of the three pairwise alignments per triplet. Finally, we use marginalized posterior decoding approach, as described in Lunter et al (2008), to identify high confidence triplets of homologous sites across all pairs, which are then used to estimate the substitution model using the standard Felsenstein tree likelihood (Felsenstein 1981). To estimate the indel parameters, and ε, we only use the set of pairwise alignments from our triplets and maximize the likelihood of state transitions given the previously estimated divergence times on the triplets.…”
Section: Pahmm-tree: Implementation and Optimizationmentioning
confidence: 99%