Synthetic DNA has been widely considered an attractive medium for digital data storage. However, the random insertion−deletion−substitution (IDS) errors in the sequenced reads still remain a critical challenge to reliable data recovery. Motivated by the modulation technique in the communication field, we propose a new DNA storage architecture to solve this problem. The main idea is that all binary data are modulated into DNA sequences with the same AT/GC patterns, which facilitate the detection of indels in noisy reads. The modulation signal could not only satisfy the encoding constraints but also serve as prior information to detect the potential positions of errors. Experiments on simulation and real data sets demonstrate that modulation encoding provides a simple way to comply with biological constraints for sequence encoding (i.e., balanced GC content and avoiding homopolymers). Furthermore, modulation decoding is highly efficient and extremely robust, which can correct up to ∼40% of errors. In addition, it is robust to imperfect clustering reconstruction, which is very common in practice. Although our method has a relatively low logical density of 1.0 bits/nt, its high robustness may provide a wide space for developing low-cost synthetic technologies. We believe this new architecture may boost the early coming of large-scale DNA storage applications in the future.
DNA, or deoxyribonucleic acid, is a powerful molecule that plays a fundamental role in the storing and processing genetic information of all living organisms. In recent years, scientists over the world have devoted to taking advantage of its high density, energy efficiency and long durability to solve the challenges in information technology. Here, we propose to build an instance-based learning model by DNA molecules. The handwriting digit images in MNIST dataset are encoded by DNA sequences using a deep learning encoder. And the reversal complementary sequence of a query image is used to hybridize with the training instance sequences. Simulation results by NUPACK show that this classification model by DNA could achieve 95% accuracy on average. Wet-lab experiments also validate the predicted yield is consistent with the hybridization strength. Our work proves that it is feasible to build an effective instance-based classification model for practical application.
Rapid development in synthetic technologies has boosted DNA as a potential medium for large-scale data storage. Meanwhile, how to implement data security in DNA storage system is still an unsolved problem. In this paper, we propose an image encryption method based on the modulation-based storage architecture. The key idea is to take advantage of the unpredictable modulation signals to encrypt image in highly error-prone DNA storage channel. Numerical results demonstrate that our image encryption method is feasible and effective with excellent security against various attacks (statistical, differential, noise and data loss, etc.). Compared with other methods by DNA molecules hybridization reaction, the proposed method is more reliable and feasible for large-scale applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.