2013
DOI: 10.1109/tit.2012.2236605
|View full text |Cite
|
Sign up to set email alerts
|

A Compression Model for DNA Multiple Sequence Alignment Blocks

Abstract: A particularly voluminous dataset in molecular genomics, known as whole genome alignments, has gained considerable importance over the last years. In this paper, we propose a compression modeling approach for the multiple sequence alignment (MSA) blocks, which make up most of these datasets. Our method is based on a mixture of finite-context models. Contrarily to other recent approaches, it addresses both the DNA bases and gap symbols at once, better exploring the existing correlations. For comparison with pre… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2013
2013
2020
2020

Publication Types

Select...
6
1

Relationship

3
4

Authors

Journals

citations
Cited by 13 publications
(14 citation statements)
references
References 27 publications
0
14
0
Order By: Relevance
“…Recently, we proposed a compression method based on a mixture of finite-context models and arithmetic coding, for compressing the MSABs [ 11 , 28 ]. This compressor algorithm was designed to compress only the DNA bases and alignment gaps of the ‘s’ lines, without considering other information, nor the possible presence of lower case DNA bases.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…Recently, we proposed a compression method based on a mixture of finite-context models and arithmetic coding, for compressing the MSABs [ 11 , 28 ]. This compressor algorithm was designed to compress only the DNA bases and alignment gaps of the ‘s’ lines, without considering other information, nor the possible presence of lower case DNA bases.…”
Section: Methodsmentioning
confidence: 99%
“…Recently, we have proposed an algorithm, based on a mixture of finite-context models and arithmetic coding, for compressing only the MSABs (Multiple Sequence Alignment Blocks) of the MAF [ 16 ] files [ 11 ]. This algorithm was only designed to compress the DNA bases and alignment gaps that are present in the MSABs.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Some algorithms have been proposed for their lossy and lossless compression [79][80][81]. Moreover, the Variant Call Format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations [82].…”
Section: File Formatsmentioning
confidence: 99%