2015
DOI: 10.1371/journal.pone.0132460
|View full text |Cite
|
Sign up to set email alerts
|

On-Demand Indexing for Referential Compression of DNA Sequences

Abstract: The decreasing costs of genome sequencing is creating a demand for scalable storage and processing tools and techniques to deal with the large amounts of generated data. Referential compression is one of these techniques, in which the similarity between the DNA of organisms of the same or an evolutionary close species is exploited to reduce the storage demands of genome sequences up to 700 times. The general idea is to store in the compressed file only the differences between the to-be-compressed and a well-kn… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(9 citation statements)
references
References 16 publications
0
9
0
Order By: Relevance
“…There are several approaches to the implementation of Local Research behavior, the main ones being highlighted here (Fig 6 ): 1. Keep a pointer between reference and target (Fig 6A), and advance this pointer at the same time in reference and target for every match or mutation identified, as implemented by tool JDNA [84]. Another possibility, implemented by tool GDC 2.0, is to perform some punctual simple verifications before initiating the search for a larger segment, advancing the pointer in reference [34];…”
Section: First Order Mappingmentioning
confidence: 99%
See 1 more Smart Citation
“…There are several approaches to the implementation of Local Research behavior, the main ones being highlighted here (Fig 6 ): 1. Keep a pointer between reference and target (Fig 6A), and advance this pointer at the same time in reference and target for every match or mutation identified, as implemented by tool JDNA [84]. Another possibility, implemented by tool GDC 2.0, is to perform some punctual simple verifications before initiating the search for a larger segment, advancing the pointer in reference [34];…”
Section: First Order Mappingmentioning
confidence: 99%
“…Local search, that looks for short and local matches and a greedy search, that looks for the longest possible matches. It is important to notice that some tools implement more than one Local Search strategy, as RLZ-Opt [74] and JDNA [84], in which it is dynamically determined, during mapping, which local search strategies will be applied. GeCo and GReEn tools, despite not executing a mapping phase, use local search strategy to create and keep their statistical models updated.…”
Section: First Order Mappingmentioning
confidence: 99%
“…Further, there are statistical methods that achieve extremely good compression rates by generating probabilistic models based on genome datasets [57]. The fourth category is referential compression where any repeated sequence in an input dataset is replaced with a reference to one or more external DNA sequences [8–11]. A data structure plays a critical role in any algorithm designed for achieving good compression ratios, fast searching of patterns inside sequences, or both.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Future work encompasses adding mechanisms for improved privacy-protection and compression of biological data [Alves et al 2015] to strengthen the overall security and efficiency of the system.…”
Section: A Hybrid Approach In the Bionankcloud Paasmentioning
confidence: 99%