2020
DOI: 10.1109/tcbb.2018.2861380
|View full text |Cite
|
Sign up to set email alerts
|

Improvingde novoAssembly Based on Read Classification

Abstract: Due to sequencing bias, sequencing error and repeat problems, the genome assemblies usually contain misarrangements and gaps. When tackling these problems, current assemblers commonly consider the read libraries as a whole and adopt the same strategy to deal with them. In this paper, we present a new pipeline for genome assembly based on reads classification (ARC). ARC classifies reads into three categories according to the frequencies of k-mers they contain. The three categories refer to (1) low depth reads, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
11
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
9

Relationship

8
1

Authors

Journals

citations
Cited by 18 publications
(11 citation statements)
references
References 29 publications
0
11
0
Order By: Relevance
“…9 ). RepAHR estimates the average read coverage using a method similar to that in literature [ 28 ]. The calculation principle is shown as follows: Where p is the horizontal coordinate of the main peak in the k-mer frequency distribution histogram, length is the average length of the input NGS reads, k is the k-mer length used in estimation which is settled to 15 by default, and Cov is the average read coverage estimated.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…9 ). RepAHR estimates the average read coverage using a method similar to that in literature [ 28 ]. The calculation principle is shown as follows: Where p is the horizontal coordinate of the main peak in the k-mer frequency distribution histogram, length is the average length of the input NGS reads, k is the k-mer length used in estimation which is settled to 15 by default, and Cov is the average read coverage estimated.…”
Section: Methodsmentioning
confidence: 99%
“…9). RepAHR estimates the average read coverage using a method similar to that in literature [28]. The calculation principle is shown as follows:…”
Section: Estimating the Average Read Coveragementioning
confidence: 99%
“…Although considerable third generation sequencing data has been produced, due to the higher cost per base and higher sequencing errors, NGS sequencing data still plays an important role in tackling an increasing list of biological problems. The de novo genome assembly is a fundamental process for computational biology (Schatz et al, 2010), which drives the generation of many assemblers to complete the construction of genome sequences, such as Velvet (Zerbino and Birney, 2008), ABySS (Simpson et al, 2009), ALLPATHS-LG (Gnerre and Jaffe, 2011), SOAPdenovo (Li et al, 2010), EPGA2 (Luo et al, 2015), Miniasm (Li, 2015), BOSS , SCOP (Li et al, 2018a), ARC (Liao et al, 2018), iLSLS (Li et al, 2018b), MEC , EPGA-SC (Liao et al, 2019a), PE-Trimmer (Liao et al, 2019b), and so on.…”
Section: Introductionmentioning
confidence: 99%
“…Read sequencing from next-generation sequencing (NGS) technology (Miller et al, 2010 ), is usually short, i.e., only a few hundred base pairs in length. Short reads commonly cannot be used to solve problems caused by long repetitive regions (Liao et al, 2020 ). In addition, NGS polymers commonly lead to some GC bias, which will affect the correctness of the genome assembly (Farrer et al, 2009 ; Luo et al, 2012 ).…”
Section: Introductionmentioning
confidence: 99%