Finite-state models in the alignment of macromolecules

Allison, Lloyd; Wallace, Chris S.; Yee, Chut N.

doi:10.1007/bf00160262

Cited by 57 publications

(57 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The machine could be made to match or recognize a particular family of sequences but only if it were given one or more examples prepended to the data to be searched. To an extent, it models the process by which a sequence could be generated and it is natural to use the term machine for this reason, and also because it is common in compression, makes a distinction with the other kind of HMM and is consistent with earlier work (Allison et al, 1992).…”

Section: Approximate Repeatsmentioning

confidence: 74%

“…The possibility of changes, insertions and deletions in a repeat allow instances to differ. In essence, states R, R2 and R3 embody a simple mutation machine, as can be used in the sequence alignment problem (Allison et al, 1992), here used to allow local alignments of the sequence with itself. For the analysis of DNA, approximate reverse complementary repeats are allowed by a further set of states R%, R2% and R3% and corresponding operations, not shown.…”

Section: Approximate Repeatsmentioning

confidence: 99%

“…For example, linear costs for gaps (indels) within repeats can be modeled by states and operations for start-insert and continue-insert etc. as in the sequence alignment problem (Allison et al, 1992). One can even envisage a systematic search through simple machines to complex machines.…”

Section: Approximate Repeatsmentioning

confidence: 99%

“…For short sequences one can perform a second backwards pass through the repeat graph and thus calculate the probability of the true path going through each node. This is analogous to the forward -backward dynamic programming algorithm which yields alignment density plots in sequence alignment (Allison et al, 1992). However, it requires either O(n 2 ) space or greater time-complexity and is impractical for long sequences.…”

Section: Approximate Repeatsmentioning

confidence: 99%

See 3 more Smart Citations

Sequence complexity for biological sequence analysis

Allison

Stern

Edgoose

et al. 2000

Computers & Chemistry

Self Cite

View full text Add to dashboard Cite

A new statistical model for DNA considers a sequence to be a mixture of regions with little structure and regions that are approximate repeats of other subsequences, i.e. instances of repeats do not need to match each other exactly. Both forward-and reverse-complementary repeats are allowed. The model has a small number of parameters which are fitted to the data. In general there are many explanations for a given sequence and how to compute the total probability of the data given the model is shown. Computer algorithms are described for these tasks. The model can be used to compute the information content of a sequence, either in total or base by base. This amounts to looking at sequences from a data-compression point of view and it is argued that this is a good way to tackle intelligent sequence analysis in general.

show abstract

Section: Approximate Repeatsmentioning

confidence: 74%

Section: Approximate Repeatsmentioning

confidence: 99%

Section: Approximate Repeatsmentioning

confidence: 99%

Section: Approximate Repeatsmentioning

confidence: 99%

See 2 more Smart Citations

Sequence complexity for biological sequence analysis

Allison

Stern

Edgoose

et al. 2000

Computers & Chemistry

Self Cite

View full text Add to dashboard Cite

show abstract

“…As in [25,26], our work is based on the premise that if two sequences are related, one sequence must tell something useful about the other: A predictive model can predict a sequence better if a related sequence is known. The information content of a sequence is measured by lossless compression.…”

Section: Introductionmentioning

confidence: 99%

A genome alignment algorithm based on compression

2010

View full text Add to dashboard Cite

BackgroundTraditional genome alignment methods consider sequence alignment as a variation of the string edit distance problem, and perform alignment by matching characters of the two sequences. They are often computationally expensive and unable to deal with low information regions. Furthermore, they lack a well-principled objective function to measure the performance of sets of parameters. Since genomic sequences carry genetic information, this article proposes that the information content of each nucleotide in a position should be considered in sequence alignment. An information-theoretic approach for pairwise genome local alignment, namely XMAligner, is presented. Instead of comparing sequences at the character level, XMAligner considers a pair of nucleotides from two sequences to be related if their mutual information in context is significant. The information content of nucleotides in sequences is measured by a lossless compression technique.ResultsExperiments on both simulated data and real data show that XMAligner is superior to conventional methods especially on distantly related sequences and statistically biased data. XMAligner can align sequences of eukaryote genome size with only a modest hardware requirement. Importantly, the method has an objective function which can obviate the need to choose parameter values for high quality alignment. The alignment results from XMAligner can be integrated into a visualisation tool for viewing purpose.ConclusionsThe information-theoretic approach for sequence alignment is shown to overcome the mentioned problems of conventional character matching alignment methods. The article shows that, as genomic sequences are meant to carry information, considering the information content of nucleotides is helpful for genomic sequence alignment.AvailabilityDownloadable binaries, documentation and data can be found at ftp://ftp.infotech.monash.edu.au/software/DNAcompress-XM/XMAligner/.

show abstract

Effects of sequence alignment procedures on estimates of phylogeny

Goldman¹

1998

Bioessays

View full text Add to dashboard Cite

Finite-state models in the alignment of macromolecules

Cited by 57 publications

References 34 publications

Sequence complexity for biological sequence analysis

Sequence complexity for biological sequence analysis

A genome alignment algorithm based on compression

Effects of sequence alignment procedures on estimates of phylogeny

Contact Info

Product

Resources

About