Forward Error Correction for DNA Data Storage

Blawat, Meinolf; Gaedke, K.; Hütter, Ingo; Chen, Xiaoming; Turczyk, Brian M.; Inverso, Samuel A.; Pruitt, Benjamin W.; Church, George M.

doi:10.1016/j.procs.2016.05.398

Cited by 268 publications

(264 citation statements)

References 1 publication

Supporting

Mentioning

261

Contrasting

Unclassified

Order By: Relevance

“…DNA is an excellent medium for data storage with demonstrated information density of petabytes of data per gram, high durability, and evolutionarily optimized machinery to faithfully replicate this information 1,2 . Recently, a series of proof-of-principle experiments have demonstrated the value of DNA as a storage medium [3][4][5][6][7][8][9] .…”

Section: Textmentioning

confidence: 99%

See 1 more Smart Citation

DNA Fountain enables a robust and efficient storage architecture

Erlich

Zielinski

2016

Preprint

144

332

View full text Add to dashboard Cite

DNA is an attractive medium to store digital information. Here, we report a storage strategy, called DNA Fountain, that is highly robust and approaches the information capacity per nucleotide. Using our approach, we stored a full computer operating system, movie, and other files with a total of 2.14×10 6 bytes in DNA oligos and perfectly retrieved the information from a sequencing coverage equivalent of a single tile of Illumina sequencing. We also tested a process that can allow 2.18×10 15 retrievals using the original DNA sample and were able to perfectly decode the data. Finally, we explored the limit of our architecture in terms of bytes per molecules and obtained a perfect retrieval from a density of 215Petabyte/gram of DNA, orders of magnitudes higher than previous techniques.. CC-BY-NC 4.0 International license It is made available under a was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.Thecopyright holder for this preprint (which . http://dx.doi.org/10.1101/074237 doi: bioRxiv preprint first posted online Sep. 9, 2016; 2 TextHumanity is currently producing data at exponential rates, creating a demand for better storage devices.DNA is an excellent medium for data storage with demonstrated information density of petabytes of data per gram, high durability, and evolutionarily optimized machinery to faithfully replicate this information 1,2 . Recently, a series of proof-of-principle experiments have demonstrated the value of DNA as a storage medium 3-9 .To better understand its potential, we explored the Shannon information capacity 10,11 of DNA storage 12 .This measure sets a tight upper bound on the amount of information that can be reliably stored in each nucleotide. In an ideal world, the information capacity of each nucleotide could reach 2bits, since there are four possible options. However, DNA encoding faces several practical limitations. First, not all DNA sequences are created equal 13,14 . Biochemical constraints dictate that DNA sequences with high GC content or long homopolymer runs (e.g. AAAAAA…) should be avoided as they are difficult to

show abstract

Section: Textmentioning

confidence: 99%

“…AAAAAA…) should be avoided as they are difficult to synthesize and prone to sequencing errors. Second, oligo synthesis, PCR amplification, and decay of DNA during storage can induce uneven representation of the oligos 7,15 . This might result in dropout of a small fraction of oligos that will not be available for decoding.…”

Section: Textmentioning

confidence: 99%

DNA Fountain enables a robust and efficient storage architecture

Erlich

Zielinski

2016

Preprint

144

332

View full text Add to dashboard Cite

show abstract

“…Due to its high theoretical data density of 455 exabyte per gram 1 and its high stability 2 , DNA has recently been proposed as a capable digital data storage medium. Poems, books, music, images and whole operating systems have already been stored in and successfully retrieved from synthetic DNA [3][4][5][6] . Another advantage of DNA as a technical data storage substrate is that by having the same properties as natural DNA, it can be read using the same high-throughput "next-generation sequencing" (NGS) platforms.…”

Section: Main Textmentioning

confidence: 99%

Genomic encryption of digital data stored in synthetic DNA

Grass

Heckel

Dessimoz

et al. 2019

Preprint

View full text Add to dashboard Cite

Today, we can read human genomes and store digital data robustly in synthetic DNA.Here we report a strategy to intertwine these two technologies to enable the secure storage of valuable information in synthetic DNA, protected with personalized keys. We show that genetic short tandem repeats (STRs) contain sufficient entropy to generate strong encryption keys, and that only one technology, DNA sequencing, is required to simultaneously read key and data. Using this approach, we experimentally generated 80 bit strong keys from human DNA, and used such a key to encrypt 17kB of digital information stored in synthetic DNA. Finally, the decrypted information was recovered perfectly from a single massively parallel sequencing run.

show abstract

“…Church et al 3 Goldman et al 5 Bornholt et al 4 Erlich and Zielinski 15 Blawat et al 6 Yadzi et al 16…”

Section: Data Sizementioning

confidence: 99%

Scaling up DNA data storage and random access retrieval

Organick

Ang

Chen

et al. 2017

Preprint

View full text Add to dashboard Cite

Current storage technologies can no longer keep pace with exponentially growing amounts of data.1 Synthetic DNA offers an attractive alternative due to its potential information density of ~ 10 18 B/mm 3 , 10 7 times denser than magnetic tape, and potential durability of thousands of years.2 Recent advances in DNA data storage have highlighted technical challenges, in particular, coding and random access, but have stored only modest amounts of data in synthetic DNA. 3,4,5 This paper demonstrates an end-to-end approach toward the viability of DNA data storage with large-scale random access. We encoded and stored 35 distinct files, totaling 200MB of data, in more than 13 million DNA oligonucleotides (about 2 billion nucleotides in total) and fully recovered the data with no bit errors, representing an advance of almost an order of magnitude compared to prior work. 6 Our data curation focused on technologically advanced data types and historical relevance, including the Universal Declaration of Human Rights in over 100 languages, 7 a high-definition music video of the band OK Go, 8 and a CropTrust database of the seeds stored in the Svalbard Global Seed Vault. 9 We developed a random access methodology based on selective amplification, for which we designed and validated a large library of primers, and successfully retrieved arbitrarily chosen items from a subset of our pool containing 10.3 million DNA sequences. Moreover, we developed a novel coding scheme that dramatically reduces the physical redundancy (sequencing read coverage) required for error-free decoding to a median of 5x, while maintaining levels of logical redundancy comparable to the best prior codes. We further stress-tested our coding approach by successfully decoding a file using the more error-prone nanopore-based sequencing. We provide a detailed analysis of errors in the process of writing, storing, and reading data from synthetic DNA at a large scale, which helps peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/114553 doi: bioRxiv preprint first posted online Mar. 7, 2017; 2 characterize DNA as a storage medium and justify our coding approach. Thus, we have demonstrated a significant improvement in data volume, random access, and encoding/decoding schemes that contribute to a whole-system vision for DNA data storage.Storing digital data using synthetic DNA encompasses mapping bits into nucleotide sequences, synthesizing the corresponding molecules, and storing them in an appropriate environment.Reading the information requires sequencing and converting the stored DNA back into digital data. Our project explores this DNA data storage workflow end-to-end (Figure 1a). We focus on scaling up data volumes and solving associated key challenges. Specifically, we address the need to access data selectively rather than in bulk and the need to minimize the amount of sequencing required to completely recover stored data.Most pr...

show abstract

Forward Error Correction for DNA Data Storage

Cited by 268 publications

References 1 publication

DNA Fountain enables a robust and efficient storage architecture

DNA Fountain enables a robust and efficient storage architecture

Genomic encryption of digital data stored in synthetic DNA

Scaling up DNA data storage and random access retrieval

Contact Info

Product

Resources

About