2020
DOI: 10.1093/bioinformatics/btaa140
|View full text |Cite
|
Sign up to set email alerts
|

MESA: automated assessment of synthetic DNA fragments and simulation of DNA synthesis, storage, sequencing and PCR errors

Abstract: Summary The development of de novo DNA synthesis, polymerase chain reaction (PCR), DNA sequencing and molecular cloning gave researchers unprecedented control over DNA and DNA-mediated processes. To reduce the error probabilities of these techniques, DNA composition has to adhere to method-dependent restrictions. To comply with such restrictions, a synthetic DNA fragment is often adjusted manually or by using custom-made scripts. In this article, we present MESA (Mosla Error Simulator), a web… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
45
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8

Relationship

2
6

Authors

Journals

citations
Cited by 32 publications
(47 citation statements)
references
References 23 publications
0
45
0
Order By: Relevance
“…Next, we compare several classifiers ( Fig. 3 ) including KRAKEN2, SINTAX, IDTAXA, the naïve Bayesian classifier implemented in DADA2 (DADA2-NBC), and the naïve Bayes scikit-learn classifier implemented in QIIME2 (QIIME2-NB) for their ability in accurately annotating query sequences in simQS-V3V4-i to simQS-V3V4-iii —simulated short-read data sets generated by introducing realistic error rates (∼1%) to bee-associated V3-V4 sequences (randomly sampled from the parent database BEEx-FL-refs during in silico PCR) using established Mosla Error Simulator (MESA) software ( 56 ) (see Materials and Methods section for more details).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Next, we compare several classifiers ( Fig. 3 ) including KRAKEN2, SINTAX, IDTAXA, the naïve Bayesian classifier implemented in DADA2 (DADA2-NBC), and the naïve Bayes scikit-learn classifier implemented in QIIME2 (QIIME2-NB) for their ability in accurately annotating query sequences in simQS-V3V4-i to simQS-V3V4-iii —simulated short-read data sets generated by introducing realistic error rates (∼1%) to bee-associated V3-V4 sequences (randomly sampled from the parent database BEEx-FL-refs during in silico PCR) using established Mosla Error Simulator (MESA) software ( 56 ) (see Materials and Methods section for more details).…”
Section: Resultsmentioning
confidence: 99%
“…Benchmarks performed on error-free sequence queries derived from an identical database as is being used to classify the queries is expected to result in unrealistically inflated performance rates ( 1 ). To enable more realistic testing conditions during experiments, error rates of approximately ∼1% were introduced to the sequence representatives derived from BEEx-FL-refs using established Mosla Error Simulator (MESA) software ( 56 ). Briefly, the ErrASE synthesis method was chosen with the default sequencing method set for paired-end Illumina MiSeq alongside a standard 30-cycle traditional PCR amplification step and a 12-month sample storage period.…”
Section: Methodsmentioning
confidence: 99%
“…The highest customizability is obtained by using the MESA [ 32 ] API. Since MESA as a web tool for the automated assessment of synthetic DNA fragments and simulation of DNA synthesis, storage, sequencing, and PCR errors does not only allow user-defined configurations but also offers a REST-API, MESA allows a fine-grained and correct assessment of error probabilities per packet.…”
Section: Methodsmentioning
confidence: 99%
“…Deoxyribonucleic acid sequences containing consecutive repetitive subsequences are more likely to be misaligned during sequencing and this results in data-reading errors (Myers, 2007 Tandem Repeats and Morphological Variation | Learn Science at Scitable). Sequences containing consecutive repetitive subsequences easily produce polymerase slippage at the synthesis phase (Schwarz et al, 2020). Two DNA sequences can easily become dislocated in the repetitive region.…”
Section: Non-adjacent Subsequencementioning
confidence: 99%
“…Therefore, it is vital to study the sources of errors that impact DNA storage and coding. Earlier studies ( Myers, 2007 Tandem Repeats and Morphological Variation | Learn Science at Scitable; Kovacevic and Tan, 2018 ; Schwarz et al, 2020 ) revealed that the error rate in the storage process increases if there are consecutive repetitive subsequences in the sequence. Hence, we propose a novel constraint (non-adjacent subsequence constraint) to avoid the occurrence of this sequence.…”
Section: Introductionmentioning
confidence: 99%