2013
DOI: 10.1186/1471-2105-14-272
|View full text |Cite
|
Sign up to set email alerts
|

Levenshtein error-correcting barcodes for multiplexed DNA sequencing

Abstract: BackgroundHigh-throughput sequencing technologies are improving in quality, capacity and costs, providing versatile applications in DNA and RNA research. For small genomes or fraction of larger genomes, DNA samples can be mixed and loaded together on the same sequencing track. This so-called multiplexing approach relies on a specific DNA tag or barcode that is attached to the sequencing or amplification primer and hence appears at the beginning of the sequence in every read. After sequencing, each sample read … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
79
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 110 publications
(79 citation statements)
references
References 27 publications
0
79
0
Order By: Relevance
“…This property makes it suitable for use in DNA context (for full definitions and descriptions of this distance and algorithms described in this subsection see our previous work, [7]). The prefix of a sequence can be an empty sequence, the sequence itself, or any subsequence starting from position 1 up to any end position.…”
Section: Methodsmentioning
confidence: 99%
“…This property makes it suitable for use in DNA context (for full definitions and descriptions of this distance and algorithms described in this subsection see our previous work, [7]). The prefix of a sequence can be an empty sequence, the sequence itself, or any subsequence starting from position 1 up to any end position.…”
Section: Methodsmentioning
confidence: 99%
“…However, in the case of multiple samples having been sequenced simultaneously, which are distinguishable from one another by short, unique barcodes, these reads can cause more severe issues as it becomes necessary to demultiplex the read data in the presence of potential sequencing errors in these barcodes. To circumvent this problem, several methods have been developed for designing barcodes with an error-correction capability that aid correct sample identification in the presence of sequence alterations introduced during synthesis, primer ligation, amplification or sequencing (Buschmann and Bystrykh, 2013). Popular error-correcting techniques include methods based on false discovery rate statistics (see, for example, Buschmann and Bystrykh, 2013) as well as adaptations of both Hamming codes (see, for example, Hamady et al, 2008;Bystrykh, 2012) and Levenshtein codes (for example, implemented in the software Sequence-Levenshtein (Buschmann and Bystrykh, 2013) and TagGD (Costea et al, 2013)).…”
Section: Quality Assessmentmentioning
confidence: 99%
“…To circumvent this problem, several methods have been developed for designing barcodes with an error-correction capability that aid correct sample identification in the presence of sequence alterations introduced during synthesis, primer ligation, amplification or sequencing (Buschmann and Bystrykh, 2013). Popular error-correcting techniques include methods based on false discovery rate statistics (see, for example, Buschmann and Bystrykh, 2013) as well as adaptations of both Hamming codes (see, for example, Hamady et al, 2008;Bystrykh, 2012) and Levenshtein codes (for example, implemented in the software Sequence-Levenshtein (Buschmann and Bystrykh, 2013) and TagGD (Costea et al, 2013)). Algorithms based on the latter are capable of correcting not only substitution errors but also insertion and deletion (indel) errors that is particularly important for sequencing technologies where indels are the main source of error (that is, 454 and Ion Torrent).…”
Section: Quality Assessmentmentioning
confidence: 99%
“…In order to generate distinct 12mer barcode sequences, we took 2,000 20mer primer sequences derived from Eroshenko et al 18 , removed all sequences containing NdeI, KpnI, BtsI-v2, BspQI, EcoRI, XhoI, SpeI, and NotI, and generated all possible 12mer subset sequences. We next screened for self-dimers, GC content between 45% and 55% and a melting temperature between 40°C and 42°C, We further filtered sequences to have a minimum modified Levenshtein distance of 3 between selected barcodes 27 . We then selected the first 384 sequences to be used in oligo designs, with complementary sequences being used to generate the beads.…”
Section: Microbead Barcode Designmentioning
confidence: 99%