We consider the problem of storing and retrieving information from synthetic DNA media. The mathematical basis of the problem is the construction and design of sequences that may be discriminated based on their collection of substrings observed through a noisy channel. This problem of reconstructing sequences from traces was first investigated in the noiseless setting under the name of "Markov type" analysis. Here, we explain the connection between the reconstruction problem and the problem of DNA synthesis and sequencing, and introduce the notion of a DNA storage channel. We analyze the number of sequence equivalence classes under the channel mapping and propose new asymmetric coding techniques to combat the effects of synthesis and sequencing noise. In our analysis, we make use of restricted de Bruijn graphs and Ehrhart theory for rational polytopes.
A cross-bifix-free code is a set of words in which no prefix of any length of any word is the suffix of any word in the set. Cross-bifix-free codes arise in the study of distributed sequences for frame synchronization. We provide a new construction of cross-bifix-free codes which generalizes the construction by Bajic to longer code lengths and to any alphabet size. The codes are shown to be nearly optimal in size. We also establish new results on Fibonacci sequences, which are used in estimating the size of the cross-bifix-free codes.
We study (symbol-pair) codes for symbol-pair read channels introduced recently by Cassuto and Blaum (2010). A Singleton-type bound on symbol-pair codes is established and infinite families of optimal symbol-pair codes are constructed. These codes are maximum distance separable (MDS) in the sense that they meet the Singleton-type bound. In contrast to classical codes, where all known -ary MDS codes have length , we show that -ary MDS symbol-pair codes can have length . In addition, we completely determine the existence of MDS symbol-pair codes for certain parameters.Index Terms-Codes for magnetic storage, maximal distance separable, Singleton-type bound, symbol-pair read channels.
Racetrack memory is a new technology which utilizes magnetic domains along a nanoscopic wire in order to obtain extremely high storage density. In racetrack memory, each magnetic domain can store a single bit of information, which can be sensed by a reading port (head). The memory has a tape-like structure which supports a shift operation that moves the domains to be read sequentially by the head. In order to increase the memory's speed, prior work studied how to minimize the latency of the shift operation, while the no less important reliability of this operation has received only a little attention.In this work we design codes which combat shift errors in racetrack memory, called position errors. Namely, shifting the domains is not an error-free operation and the domains may be over-shifted or are not shifted, which can be modeled as deletions and sticky insertions. While it is possible to use conventional deletion and insertion-correcting codes, we tackle this problem with the special structure of racetrack memory, where the domains can be read by multiple heads. Each head outputs a noisy version of the stored data and the multiple outputs are combined in order to reconstruct the data. Under this paradigm, we will show that it is possible to correct, with at most a single bit of redundancy, d deletions with d + 1 heads if the heads are well-separated. Similar results are provided for burst of deletions, sticky insertions and combinations of both deletions and sticky insertions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.