2018
DOI: 10.1101/443101
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A framework for space-efficient variable-order Markov models

Abstract: Motivation: Markov models with contexts of variable length are widely used in bioinformatics for representing sets of sequences with similar biological properties. When models contain many long contexts, existing implementations are either unable to handle genome-scale training datasets within typical memory budgets, or they are optimized for specific model variants and are thus inflexible. Results: We provide practical, versatile representations of variable-order Markov models and of interpolated Markov model… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
1
1
1
1

Relationship

4
0

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 66 publications
0
4
0
Order By: Relevance
“…The bidirectional index can then be used to construct both of the required tree topologies in O(n log σ ) time using bidirectional extension operations and the counter-based topology construction method of Belazzougui et al [38]. See the supplement of a paper on variable order Markov models by Cunial et al [43] for more details on the construction of the tree topologies. The topologies are then indexed for various navigational operations required by the contraction operation described in [39].…”
Section: Deterministic Index Construction For Integer Alphabetsmentioning
confidence: 99%
“…The bidirectional index can then be used to construct both of the required tree topologies in O(n log σ ) time using bidirectional extension operations and the counter-based topology construction method of Belazzougui et al [38]. See the supplement of a paper on variable order Markov models by Cunial et al [43] for more details on the construction of the tree topologies. The topologies are then indexed for various navigational operations required by the contraction operation described in [39].…”
Section: Deterministic Index Construction For Integer Alphabetsmentioning
confidence: 99%
“…Given an array MS S,T and a user-defined threshold τ , let a thresholded matching statis-tics array MS S,T,τ be such that MS S,T,τ [i] = MS S,T [i] if MS S,T [i] ≥ τ , and MS S,T,τ [i] equals an arbitrary (possibly negative) value smaller than τ otherwise 2 . This notion is symmetrical to the one defined by [9], which discards instead long MS values in order to prune the suffix tree topologies and to make the data structures smaller. Given an encoder δ, we are interested in the MS S,T,τ array whose ms S,T,τ bitvector takes the smallest amount of space when encoded with δ.…”
Section: Compressing the Ms Bitvectormentioning
confidence: 99%
“…The bidirectional index can then be used to construct both of the required tree topologies in O(n log σ) time using bidirectional extension operations and the counter-based topology construction method of Belazzougui et al [28]. See the supplement of a paper on variable order Markov models by Cunial et al [33] for more details on the construction of the tree topologies. The topologies are then indexed for various navigational operations required by the contraction operation described in [29].…”
Section: Deterministic Index Construction For Integer Alphabetsmentioning
confidence: 99%