2022
DOI: 10.48550/arxiv.2201.03590
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Reassembly Codes for the Chop-and-Shuffle Channel

Abstract: We study the problem of retrieving data from a channel that breaks the input sequence into a set of unordered fragments of random lengths, which we refer to as the chop-and-shuffle channel. The length of each fragment follows a geometric distribution. We propose nested Varshamov-Tenengolts (VT) codes to recover the data. We evaluate the error rate and the complexity of our scheme numerically. Our results show that the decoding error decreases as the input length increases, and our method has a significantly lo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
5
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 19 publications
0
5
0
Order By: Relevance
“…Multiple channel models have recently been suggested and studied based on this property. An assumption of overlap in read substrings and (near) uniform coverage leads to the problem of string reconstruction from substring composition [3], [4], [7], [13], [21], [22], [27], [29]; on the contrary, assuming no overlap in read substrings leads to the torn-paper problem [23], [25], [30], a problem closely related to the shuffling channel [16], [17], [28], [32]. This problem is motivated by DNA-based storage systems, where the information is stored in synthesized strands of DNA molecules.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Multiple channel models have recently been suggested and studied based on this property. An assumption of overlap in read substrings and (near) uniform coverage leads to the problem of string reconstruction from substring composition [3], [4], [7], [13], [21], [22], [27], [29]; on the contrary, assuming no overlap in read substrings leads to the torn-paper problem [23], [25], [30], a problem closely related to the shuffling channel [16], [17], [28], [32]. This problem is motivated by DNA-based storage systems, where the information is stored in synthesized strands of DNA molecules.…”
Section: Introductionmentioning
confidence: 99%
“…In the torn-paper channel [25], [30], also known as the chop-and-shuffle channel [23], a long information sequence is The first, second, and fourth authors contributed equally to this work.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Notably, when observations are composed of consecutive substrings, the reconstruction from substring-compositions problem [1], [1], [4], [10], [12], [17], [19], [20], [25], [27], [31], [32] and the torn-paper problem [2], [21], [22], [28] (a problem closely related to the shuffling channel [11], [13], [26], [30]) have received significant interest in the past decade due to applications in DNA-or polymer-based storage systems, resulting from contemporary sequencing technologies [4], [9], [20]. The former arises from an idealized assumption of full overlap (and uniform coverage) in read substrings, while the latter results from an assumption of no overlap; in applications, this models the question of whether the complete information string may be replicated and uniformly segmented for sequencing, or if segmentation occurs adversarially in the medium prior to sequencing.…”
Section: Introductionmentioning
confidence: 99%
“…Notably, when observations are comprised of unordered consecutive substrings, two distinct models have received significant interest in the past decade due to applications in DNA-or polymer-based storage systems, resulting from contemporary sequencing technologies [4], [12], [26]. The first is the reconstruction from substring-compositions problem [1], [4], [10], [14], [17], [23], [25], [26], [32], [34], [38], [39] (including extensions for erroneous observations [5], [12], [23], [39]), which arises from an idealized assumption of full overlap (and uniform coverage) in read substrings; the second is the torn-paper problem [2], [27], [28], [35] (a problem closely related to the shuffling channel [15], [18], [33], [37]), which results from an assumption of no overlap. In applications, the distinction models the question of whether the complete information string may be replicated and uniformly segmented for sequencing, or if segmentation occurs adversarially in the medium prior to sequencing.…”
Section: Introductionmentioning
confidence: 99%