In the trace reconstruction problem a length-n string x yields a collection of noisy copies, called traces, y1, …, yt where each yi is independently obtained from x by passing through a deletion channel, which deletes every symbol with some fixed probability. The main goal under this paradigm is to determine the required minimum number of i.i.d traces in order to reconstruct x with high probability. The trace reconstruction problem can be extended to the model where each trace is a result of x passing through a deletion-insertion-substitution channel, which introduces also insertions and substitutions. Motivated by the storage channel of DNA, this work is focused on another variation of the trace reconstruction problem, which is referred by the DNA reconstruction problem. A DNA reconstruction algorithm is a mapping which receives t traces y1, …, yt as an input and produces , an estimation of x. The goal in the DNA reconstruction problem is to minimize the edit distance between the original string and the algorithm’s estimation. For the deletion channel case, the problem is referred by the deletion DNA reconstruction problem and the goal is to minimize the Levenshtein distance .In this work, we present several new algorithms for these reconstruction problems. Our algorithms look globally on the entire sequence of the traces and use dynamic programming algorithms, which are used for the shortest common supersequence and the longest common subsequence problems, in order to decode the original sequence. Our algorithms do not require any limitations on the input and the number of traces, and more than that, they perform well even for error probabilities as high as 0.27. The algorithms have been tested on simulated data as well as on data from previous DNA experiments and are shown to outperform all previous algorithms.
This paper studies the problem of reconstructing a word given several of its noisy copies. This setup is motivated by several applications, among them is reconstructing strands in DNA-based storage systems. Under this paradigm, a word is transmitted over some fixed number of identical independent channels and the goal of the decoder is to output the transmitted word or some close approximation. The main focus of this paper is the case of two deletion channels and studying the error probability of the maximum-likelihood (ML) decoder under this setup. First, it is discussed how the ML decoder operates. Then, we observe that the dominant error patterns are deletions in the same run or errors resulting from alternating sequences. Based on these observations, it is derived that the error probability of the ML decoder is roughly 3q−1 q−1 p 2 , when the transmitted word is any q-ary sequence and p is the channel's deletion probability. We also study the cases when the transmitted word belongs to the Varshamov Tenengolts (VT) code or the shifted VT code. Lastly, the insertion channel is studied as well. These theoretical results are verified by corresponding simulations.
Flash memory is prevalent in modern servers and devices. Coupled with the scaling down of flash technology, the popularity of flash memory motivates the search for methods to increase flash reliability and lifetime. Erasures are the dominant cause of flash cell wear, but reducing them is challenging because flash is a write-once medium— memory cells must be erased prior to writing. An approach that has recently received considerable attention relies on write-once memory (WOM) codes, designed to accommodate additional writes on write-once media. However, the techniques proposed for reusing flash pages with WOM codes are limited in their scope. Many focus on the coding theory alone, whereas others suggest FTL designs that are application specific, or not applicable due to their complexity, overheads, or specific constraints of multilevel cell (MLC) flash. This work is the first that addresses all aspects of page reuse within an end-to-end analysis of a general-purpose FTL on MLC flash. We use a hardware evaluation setup to directly measure the short- and long-term effects of page reuse on SSD durability and energy consumption, and show that FTL design must explicitly take them into account. We then provide a detailed analytical model for deriving the optimal garbage collection policy for such FTL designs, and for predicting the benefit from reuse on realistic hardware and workload characteristics.
This paper tackles two problems that fall under the study of coding for insertions and deletions. These problems are motivated by several applications, among them is reconstructing strands in DNA-based storage systems. Under this paradigm, a word is transmitted over some fixed number of identical independent channels and the goal of the decoder is to output the transmitted word or some close approximation of it. The first part of this paper studies the deletion channel that deletes a symbol with some fixed probability p, while focusing on two instances of this channel. Since operating the maximum likelihood (ML) decoder in this case is computationally unfeasible, we study a slightly degraded version of this decoder for two channels and study its expected normalized distance. We observe that the dominant error patterns are deletions in the same run or errors resulting from alternating sequences. Based on these observations, it is derived that the expected normalized distance of the degraded ML decoder is roughly 3q−1 q−1 p 2 , when the transmitted word is any q-ary sequence and p is the channel's deletion probability. We also study the cases when the transmitted word belongs to the Varshamov Tenengolts (VT) code or the shifted VT code. Additionally, the insertion channel is studied as well as the case of two insertion channels. These theoretical results are verified by corresponding simulations. The second part of the paper studies optimal decoding for a special case of the deletion channel, referred by the k-deletion channel, which deletes exactly k symbols of the transmitted word uniformly at random. In this part, the goal is to understand how an optimal decoder operates in order to minimize the expected normalized distance. A full characterization of an efficient optimal decoder for this setup, reffered to as the maximum likelihood* (ML*) decoder, is given for a channel that deletes one or two symbols. For k = 1 it is shown that when the code is the entire space, the decoder is the lazy decoder which simply returns the channel output. Similarly, for k = 2 it is shown that the decoder acts as the lazy decoder in almost all cases and when the longest run is significantly long (roughly (2 − √ 2)n when n is the word length), it prolongs the longest run by one symbol.
Flash memory is a write-once medium in which reprogramming cells requires first erasing the block that contains them. The lifetime of the flash is a function of the number of block erasures and can be as small as several thousands. To reduce the number of block erasures, pages, which are the smallest write unit, are rewritten out-of-place in the memory. A Write-once memory (WOM) code is a coding scheme which enables to write multiple times to the block before an erasure. However, these codes come with significant rate loss. For example, the rate for writing twice (with the same rate) is at most 0.77.In this paper, we study WOM codes and their tradeoff between rate loss and reduction in the number of block erasures, when pages are written uniformly at random. First, we introduce a new measure, called erasure factor, that reflects both the number of block erasures and the amount of data that can be written on each block. A key point in our analysis is that this tradeoff depends upon the specific implementation of WOM codes in the memory. We consider two systems that use WOM codes; a conventional scheme that was commonly used, and a new recent design that preserves the overall storage capacity. While the first system can improve the erasure factor only when the storage rate is at most 0.6442, we show that the second scheme always improves this figure of merit.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.