2017
DOI: 10.1093/gigascience/gix120
|View full text |Cite
|
Sign up to set email alerts
|

SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data

Abstract: Quality control (QC) and preprocessing are essential steps for sequencing data analysis to ensure the accuracy of results. However, existing tools cannot provide a satisfying solution with integrated comprehensive functions, proper architectures, and highly scalable acceleration. In this article, we demonstrate SOAPnuke as a tool with abundant functions for a “QC-Preprocess-QC” workflow and MapReduce acceleration framework. Four modules with different preprocessing functions are designed for processing dataset… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
883
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
9

Relationship

1
8

Authors

Journals

citations
Cited by 1,601 publications
(883 citation statements)
references
References 36 publications
0
883
0
Order By: Relevance
“…Two PE libraries with insert sizes of 270 and 500 bp were constructed for PE150 sequencing on the Illumina Hiseq Xten platform and generated a total of 133 Gb raw reads (Table S1). We filtered the raw reads by soapnuke (Chen et al, ) with parameters ‐n 0.1 ‐l 20 ‐q 0.4 ‐d ‐M 1 ‐Q 2 ‐i ‐G –seqType 1. After removing low‐quality and redundant reads, 114 Gb of PE (2 × 150) clean reads was obtained (Table S2).…”
Section: Resultsmentioning
confidence: 99%
“…Two PE libraries with insert sizes of 270 and 500 bp were constructed for PE150 sequencing on the Illumina Hiseq Xten platform and generated a total of 133 Gb raw reads (Table S1). We filtered the raw reads by soapnuke (Chen et al, ) with parameters ‐n 0.1 ‐l 20 ‐q 0.4 ‐d ‐M 1 ‐Q 2 ‐i ‐G –seqType 1. After removing low‐quality and redundant reads, 114 Gb of PE (2 × 150) clean reads was obtained (Table S2).…”
Section: Resultsmentioning
confidence: 99%
“…Adapter sequences and low‐quality reads were trimmed from raw reads with SOAPnuke v. 1.5.2 (Chen et al, ) to generate clean reads. Trimmed clean reads were mapped to the T. urticae transcriptome assembly ASM23943v1 (RefSeq assembly accession: GCF_000239435.1; updated 25/5/2018; Grbić et al, ) using Salmon v. 0.9.1 (Patro et al, ) with default parameters.…”
Section: Methodsmentioning
confidence: 99%
“…S1). In order to reduce the effect of sequencing errors on the assembly, we used SOAPnuke v.1.5.6 (SOAPnuke, RRID:SCR_015025) [16] to filter out low-quality reads with adapters, high base error rate, and highly unknown base proportion, and obtained 177 Gb (272×) of clean data.…”
Section: Data Descriptionmentioning
confidence: 99%