2013
DOI: 10.1093/bioinformatics/btt219
|View full text |Cite
|
Sign up to set email alerts
|

IDBA-tran: a more robust de novo de Bruijn graph assembler for transcriptomes with uneven expression levels

Abstract: Motivation: RNA sequencing based on next-generation sequencing technology is effective for analyzing transcriptomes. Like de novo genome assembly, de novo transcriptome assembly does not rely on any reference genome or additional annotation information, but is more difficult. In particular, isoforms can have very uneven expression levels (e.g. 1:100), which make it very difficult to identify low-expressed isoforms. One challenge is to remove erroneous vertices/edges with high multiplicity (produced by high-exp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
183
0
1

Year Published

2014
2014
2023
2023

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 201 publications
(186 citation statements)
references
References 21 publications
2
183
0
1
Order By: Relevance
“…The quality of the filtered libraries was assessed with the software fastx_toolkit (http://hannonlab.cshl.edu/fastx_toolkit) with respect to the quality score of the bases, the GC‐content, and the read length. The filtered transcriptome datasets were reconstructed into contiguous cDNA sequences with IDBA‐tran, Version 1.1.1 (Peng et al., 2013) with the parameters “–mink 2 –maxk 60 –step 5 –max_count 3”. The quantitative quality assessment of the reconstructed datasets regarding the number of transcripts, number of total bases reconstructed, N50 value, and GC content was carried out using the software QUAST, Version 2.3 (Gurevich et al., 2013).…”
Section: Methodsmentioning
confidence: 99%
“…The quality of the filtered libraries was assessed with the software fastx_toolkit (http://hannonlab.cshl.edu/fastx_toolkit) with respect to the quality score of the bases, the GC‐content, and the read length. The filtered transcriptome datasets were reconstructed into contiguous cDNA sequences with IDBA‐tran, Version 1.1.1 (Peng et al., 2013) with the parameters “–mink 2 –maxk 60 –step 5 –max_count 3”. The quantitative quality assessment of the reconstructed datasets regarding the number of transcripts, number of total bases reconstructed, N50 value, and GC content was carried out using the software QUAST, Version 2.3 (Gurevich et al., 2013).…”
Section: Methodsmentioning
confidence: 99%
“…However, de Bruijn graph-based strategy between de novo genome and transcriptome assembly is slightly modiied because of the following reasons: (i) while the DNA sequencing depth is expected to be uniform across the genome (except in repetitive regions), the sequencing depth of transcripts can vary considerably, (ii) Genome assembly graph is considered as linear (theoretically one graph for each chromosome), but due to alternative splicing, transcriptome assembly is more complex than genome and requires a graph to represent the multiple alternative transcripts per locus [1,21]. By considering these challenges, several de novo assembly tools such as Trinity [1], SOAPdenovo-Trans [22], Trans-AbySS [23], Oases [24], IDBA-Tran [25], BinPacker [26] and Bridger [27] have been developed so far (Box 1). Most of these tools, which are initially developed for de novo genome assembly (except for Trinity) use de Bruijn graph-based assembly strategy and have their own pros and cons in transcript reconstruction.…”
Section: A Brief Glance At De Novo Transcript Assemblersmentioning
confidence: 99%
“…where the same genetic transcripts usually form a single component [25]. IDBA-Tran modulates the products of the k-mers of the same composition with a very normal distribution, which depends on the expression levels of the corresponding isoforms.…”
Section: Applications Of Rna-seq and Omics Strategies -From Microorgamentioning
confidence: 99%
“…The de Bruijn graph approach, e.g., Velvet [11], Abyss [12], IDBA [13][14][15][16][17][18], Euler-SR [19,20] and AllPaths [21], constructs a de Bruijn graph for the reads in which each vertex represents a length-k substring (k-mer) in a read and there is a directed edge from vertex u to vertex v if u and v are consecutive k-mers in a read, i.e., the last k-1 nucleotides of the k-mer represented by u is the same as the first k-1 nucleotides of the k-mer represented by v. Similar to the string graph approach, maximal paths without branches in the graph corresponding to contigs are outputted. There are two main advantages against the other two approaches.…”
Section: Existing Approachesmentioning
confidence: 99%
“…However, because of the existence of erroneous reads, the components representing different genes may be connected in the de Bruijn graph and cannot be separately easily. IDBA-Tran [18] models the multiplicities of k-mers in the same component by a multi-normal distribution which depends on the expression levels of the corresponding isoforms. Based on the multi-normal distribution, erroneous k-mers in the component with relative low multiplicity can be determined and be removed.…”
Section: Solutions For Assembling Transcriptomic Datamentioning
confidence: 99%