2013
DOI: 10.1016/j.ymeth.2013.06.027
|View full text |Cite
|
Sign up to set email alerts
|

Kraken: A set of tools for quality control and analysis of high-throughput sequence data

Abstract: New sequencing technologies pose significant challenges in terms of data complexity and magnitude. It is essential that efficient software is developed with performance that scales with this growth in sequence information. Here we present a comprehensive and integrated set of tools for the analysis of data from large scale sequencing experiments. It supports adapter detection and removal, demultiplexing of barcodes, paired-end data, a range of read architectures and the efficient removal of sequence redundancy… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
340
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
9

Relationship

1
8

Authors

Journals

citations
Cited by 385 publications
(341 citation statements)
references
References 16 publications
1
340
0
Order By: Relevance
“…Raw sequences were processed to move the UMI sequences to the read name using "umi_tools extract". Sample barcodes were verified and removed, and the adapter sequence was removed from the 3 ′ end of reads using the reaper tool from the Kraken package (version 15-065) (Davis et al 2013) with parameters "-3p-head-to-tail 2 -3p-prefix 6/2/1". Reads were mapped to the same genome as the original publication (mm9 for SRSF data set, hg19 for the TDP43 data set) using Bowtie version v1.1.2 (Langmead et al 2009) with the same parameters as the original publications (-v 2 -m 10 -a).…”
Section: Real Datamentioning
confidence: 99%
“…Raw sequences were processed to move the UMI sequences to the read name using "umi_tools extract". Sample barcodes were verified and removed, and the adapter sequence was removed from the 3 ′ end of reads using the reaper tool from the Kraken package (version 15-065) (Davis et al 2013) with parameters "-3p-head-to-tail 2 -3p-prefix 6/2/1". Reads were mapped to the same genome as the original publication (mm9 for SRSF data set, hg19 for the TDP43 data set) using Bowtie version v1.1.2 (Langmead et al 2009) with the same parameters as the original publications (-v 2 -m 10 -a).…”
Section: Real Datamentioning
confidence: 99%
“…Duplicate reads, arising due to PCR (when a library is of low complexity) and optical problems (at the stage of sequencing itself on machine), can bias data towards artificially frequent reads and lead to over-estimating a particular variant contribution in the data. Duplicate's removal is also discussed [33,34].…”
Section: Box 3: Base Calls Their Qualities and Readsmentioning
confidence: 99%
“…Adapter parts might be erroneously sequenced in the beginning of a read, and thus may introduce artificial mutations [35][36][37]. There are a number of popular tools for adaptor removing from raw sequence reads [34,[38][39][40].…”
Section: Box 3: Base Calls Their Qualities and Readsmentioning
confidence: 99%
“…Presence of adapters was scanned using minion and swan (Kraken package, v15-065) [8]. Preprocessing of the raw sequencing files was performed using cutadapt (v1.11) [9] in following steps: I) Very low-quality ends were trimmed (Phred<5), II) Adapters from both reads of a pair were removed with minimal overlap of 3 bp and maximum of 10% mismatches in a matched sequence (removed adapters: R1-AGATCGGAAGAGCACACGTCT-GAACTCCAGTCAC, R2 -AGATCGGAAGAGCGTCGT-GTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTAT-CATT), III).…”
Section: Bioinformatic Ngs Data Processingmentioning
confidence: 99%