2014
DOI: 10.1016/j.ygeno.2014.08.007
|View full text |Cite
|
Sign up to set email alerts
|

SeqCompress: An algorithm for biological sequence compression

Abstract: The growth of Next Generation Sequencing technologies presents significant research challenges, specifically to design bioinformatics tools that handle massive amount of data efficiently. Biological sequence data storage cost has become a noticeable proportion of total cost in the generation and analysis. Particularly increase in DNA sequencing rate is significantly outstripping the rate of increase in disk storage capacity, which may go beyond the limit of storage capacity. It is essential to develop algorith… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
13
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 20 publications
(13 citation statements)
references
References 14 publications
0
13
0
Order By: Relevance
“…In gene prediction, the information entropy for a sequence property can also be calculated as an optimized feature for machine learning methods [167,195]. Additionally, other signal processing concepts, such as digital filters [196], data compression algorithms [197], and so on, are often seen in gene prediction since they have a natural link in terms of numeric analysis and pattern recognition. In the future, techniques derived from the information theory will perhaps play more important roles in genome analysis.…”
Section: ) Cloud and Parallel Computingmentioning
confidence: 99%
“…In gene prediction, the information entropy for a sequence property can also be calculated as an optimized feature for machine learning methods [167,195]. Additionally, other signal processing concepts, such as digital filters [196], data compression algorithms [197], and so on, are often seen in gene prediction since they have a natural link in terms of numeric analysis and pattern recognition. In the future, techniques derived from the information theory will perhaps play more important roles in genome analysis.…”
Section: ) Cloud and Parallel Computingmentioning
confidence: 99%
“…Consequently, all these data pose many challenges for bioinformatics researchers, i.e., storing, sharing, fast-searching and performing operations on this large amount of genomic data become costly, requires enormous space, and has a large computation overhead for the encoding and decoding process [16]. Sometimes the cost of storage exceeds other costs, which means that storage is the primary requirement for unprocessed data [9].To overcome these challenges in an efficient way, compression may be a perfect solution to the increased storage space issue of DNA sequences [3,21]. It is required to reduce the storage size and the processing costs, as well as aid in fast searching retrieval information, and increase the transmission speed over the internet with limited bandwidth [13,22].…”
mentioning
confidence: 99%
“…Thus, with looking at the importance of data compression, lossless compression methods are recommended for various DNA file formats such as FASTA and FASTQ file formats.Currently, universal text compression algorithms, including gzip [27] and bzip [28], are not efficient in the compression of genomic data because these algorithms were designed for the compression of English text. Besides, the DNA sequences consist of four bases: two bits should be sufficient to store each base and follow no clear rules like those of text files that cannot provide proper compression results [9,29].For example, bzip2 can compress 9.7 MB of data to 2.8 MB (the compression ratio (CR) is significantly higher than 2 bits per base (bpb)). Nevertheless, this is far from satisfactory in terms of compression efficiency.…”
mentioning
confidence: 99%
See 2 more Smart Citations