Synthetic deoxyribonucleic acid (DNA) is a good medium for storing digital data for a long period due to its achievable high data storage density and outstanding longevity. However, synthesizing and sequencing DNA sequences in a DNA storage system are prone to a wide variety of errors, including insertion, deletion and mutation errors a. At the same time, it is known that DNA sequences with 50% GC content are less susceptible to errors. This paper presents the construction of a GC-balanced DNA sequence with error correction capability. A systematic single insertion/deletion/substitution error correction code is first proposed and then used to design a GC-balanced scheme for synthesizing DNA sequences. With the proposed method, DNA sequences with exactly 50% GC content are constructed. Such DNA sequences not only have the maximum endurance to errors, but are able to correct both insertion/deletion and mutation of the nucleotide bases. The decoding procedures for the sequences are described and can readily be used in practice. Simulation results show that the proposed GC-balanced DNA sequences can correct base errors adequately. a Mutation errors occurring in DNA data storage systems are equivalent to substitution errors in conventional data storage and digital communication systems. In this paper, mutation errors and substitution errors may be used interchangeably.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.