2017
DOI: 10.1038/nbt.4020
|View full text |Cite
|
Sign up to set email alerts
|

Accurate assembly of transcripts through phase-preserving graph decomposition

Abstract: We introduce Scallop, an accurate reference-based transcript assembler that improves reconstruction of multi-exon and lowly expressed transcripts. Scallop preserves long-range phasing paths extracted from reads, while producing a parsimonious set of transcripts and minimizing coverage deviation. On 10 human RNA-seq samples, Scallop produces 34.5% and 36.3% more correct multi-exon transcripts than StringTie and TransComb, and respectively identifies 67.5% and 52.3% more lowly expressed transcripts. Scallop achi… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

2
222
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
4
1

Relationship

2
7

Authors

Journals

citations
Cited by 189 publications
(224 citation statements)
references
References 23 publications
2
222
0
Order By: Relevance
“…In this work, as opposed to (author?) [5], we choose to not separately evaluate multi-exon transcripts and single-exon transcripts but rather maximize a combined AUC.…”
Section: Analyzing Parameter Behaviormentioning
confidence: 99%
See 1 more Smart Citation
“…In this work, as opposed to (author?) [5], we choose to not separately evaluate multi-exon transcripts and single-exon transcripts but rather maximize a combined AUC.…”
Section: Analyzing Parameter Behaviormentioning
confidence: 99%
“…Transcriptome assembly takes an RNA-Seq sample and a reference genome as input and reconstructs the set of transcripts that are present. Common tools for reference-based transcript assembly include Cufflinks [2], StringTie [3], TranscComb [4], and Scallop [5], Referencebased assemblers first align reads to the reference genome using a tool such as HISAT [6], STAR [7], TopHat [8], or SpliceMap [9]. Using the read splice locations (the positions where a read maps to non-neighboring locations on a genome), the assembler constructs the exons and splice-junctions of each transcript.…”
Section: Introductionmentioning
confidence: 99%
“…This is complicated by the presence of paralogous genes and transcripts with many isoforms that largely overlap one another, and as a result this approach produces highly fragmented and error-prone transcriptomes. Reference-guided assemblers such as Cufflinks (Trapnell, Williams et al 2010), Bayesembler (Maretty, Sibbesen et al 2014), StringTie (Pertea, Pertea et al 2015), TransComb (Liu, Yu et al 2016), and Scallop (Shao and Kingsford 2017) take advantage of an existing genome to which the RNA-seq reads are first aligned using a spliced aligner such as HISAT (Kim, Langmead et al 2015) or STAR (Dobin, Davis et al 2013).…”
Section: Introductionmentioning
confidence: 99%
“…No published transcript assembler has been adapted and systematically tested on the challenges of long-read transcript assembly yet. Aiming to handle these challenges, we developed a longread transcript assembler called Scallop-LR, evolved from Scallop, an accurate short-read transcript assembler (Shao and Kingsford, 2017). Scallop-LR is designed for PacBio long reads.…”
Section: Introductionmentioning
confidence: 99%