Synchronization (insertions–deletions) errors are still a major challenge for reliable information retrieval in DNA storage. Unlike traditional error correction codes (ECC) that add redundancy in the stored information, multiple sequence alignment (MSA) solves this problem by searching the conserved subsequences. In this paper, we conduct a comprehensive simulation study on the error correction capability of a typical MSA algorithm, MAFFT. Our results reveal that its capability exhibits a phase transition when there are around 20% errors. Below this critical value, increasing sequencing depth can eventually allow it to approach complete recovery. Otherwise, its performance plateaus at some poor levels. Given a reasonable sequencing depth (≤ 70), MSA could achieve complete recovery in the low error regime, and effectively correct 90% of the errors in the medium error regime. In addition, MSA is robust to imperfect clustering. It could also be combined with other means such as ECC, repeated markers, or any other code constraints. Furthermore, by selecting an appropriate sequencing depth, this strategy could achieve an optimal trade-off between cost and reading speed. MSA could be a competitive alternative for future DNA storage.
DNA, or deoxyribonucleic acid, is a powerful molecule that plays a fundamental role in the storing and processing genetic information of all living organisms. In recent years, scientists over the world have devoted to taking advantage of its high density, energy efficiency and long durability to solve the challenges in information technology. Here, we propose to build an instance-based learning model by DNA molecules. The handwriting digit images in MNIST dataset are encoded by DNA sequences using a deep learning encoder. And the reversal complementary sequence of a query image is used to hybridize with the training instance sequences. Simulation results by NUPACK show that this classification model by DNA could achieve 95% accuracy on average. Wet-lab experiments also validate the predicted yield is consistent with the hybridization strength. Our work proves that it is feasible to build an effective instance-based classification model for practical application.
With the rapid development of DNA (Deoxyribonucleic Acid) storage technologies, storing digital images in DNA is feasible. Meanwhile, the information security in DNA storage system is still a problem to solve. Therefore, in this paper, we propose a DNA storage-oriented image encryption algorithm utilizing the information processing mechanisms in molecule biology. The basic idea is to perform pixel replacement by gene hybridization, and implement dual diffusion by pixel diffusion and gene mutation. The ciphertext DNA image can be synthesized and stored in DNA storage system after encryption. Experimental results demonstrate it can resist common attacks, and shows a strong robustness in against sequence loss and base substitution errors in the DNA storage channel.
Rapid development in synthetic technologies has boosted DNA as a potential medium for large-scale data storage. Meanwhile, how to implement data security in DNA storage system is still an unsolved problem. In this paper, we propose an image encryption method based on the modulation-based storage architecture. The key idea is to take advantage of the unpredictable modulation signals to encrypt image in highly error-prone DNA storage channel. Numerical results demonstrate that our image encryption method is feasible and effective with excellent security against various attacks (statistical, differential, noise and data loss, etc.). Compared with other methods by DNA molecules hybridization reaction, the proposed method is more reliable and feasible for large-scale applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.