2016
DOI: 10.1186/s12859-016-1192-5
|View full text |Cite
|
Sign up to set email alerts
|

Removing duplicate reads using graphics processing units

Abstract: BackgroundDuring library construction polymerase chain reaction is used to enrich the DNA before sequencing. Typically, this process generates duplicate read sequences. Removal of these artifacts is mandatory, as they can affect the correct interpretation of data in several analyses. Ideally, duplicate reads should be characterized by identical nucleotide sequences. However, due to sequencing errors, duplicates may also be nearly-identical. Removing nearly-identical duplicates can result in a notable computati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
5
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
7
1

Relationship

2
6

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 21 publications
0
5
0
Order By: Relevance
“…Quality of raw sequencing data was checked with FastQC (version 0.11.5). Then, sequencing reads were trimmed using Trim Galore (version 0.4.4) (Krueger, 2015), while duplicates were identified and removed using GPU-Dup Removal (Manconi et al, 2016). After data pre-processing FastQC reported an acceptable level of quality with a median over 30 and a vast majority of 100 nt-long sequences.…”
Section: Rna Sequencingmentioning
confidence: 99%
“…Quality of raw sequencing data was checked with FastQC (version 0.11.5). Then, sequencing reads were trimmed using Trim Galore (version 0.4.4) (Krueger, 2015), while duplicates were identified and removed using GPU-Dup Removal (Manconi et al, 2016). After data pre-processing FastQC reported an acceptable level of quality with a median over 30 and a vast majority of 100 nt-long sequences.…”
Section: Rna Sequencingmentioning
confidence: 99%
“…The RNA sequencing of the samples was performed by Illumina HiSeq platform (Biodiversa, Italy) generating about 110 million and 113 million pairs of raw reads (forward and reverse strands) for the three malate samples and the three PE samples, respectively. The quality of raw sequencing data was checked with Trim Galore (version 0.4.4) (URL: http://www.bioinformatics.babraham.ac.uk/projects/trim_galore ) to trim the sequencing reads 26 , while GPU-Dup Removal was applied to identify and remove duplicates 27 . The quality control resulted in about 92 million and 60 million reads for malate and PE samples, respectively.…”
Section: Methodsmentioning
confidence: 99%
“…One of the preprocessing steps that reduce the dataset size is removing duplicate reads in the dataset. This step is essential for sequence-based algorithms since duplicate reads affect the algorithm accuracy [ 4 ]. Removing duplicate reads may reduce the assembly algorithms consumption of RAM [ 5 ].…”
Section: Introductionmentioning
confidence: 99%
“…essential for sequence-based algorithms since duplicate reads affect the algorithm accuracy [4]. Removing duplicate reads may reduce the assembly algorithms consumption of RAM [5].…”
mentioning
confidence: 99%