2018
DOI: 10.1101/463463
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

De novo clustering of long-read transcriptome data using a greedy, quality-value based algorithm

Abstract: Long-read sequencing of transcripts with PacBio Iso-Seq and Oxford Nanopore Technologies has proven to be central to the study of complex isoform landscapes in many organisms. However, current de novo transcript reconstruction algorithms from long-read data are limited, leaving the potential of these technologies unfulfilled. A common bottleneck is the dearth of scalable and accurate algorithms for clustering long reads according to their gene family of origin. To address this challenge, we develop isONclust, … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
20
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
3
1

Relationship

2
7

Authors

Journals

citations
Cited by 22 publications
(20 citation statements)
references
References 44 publications
0
20
0
Order By: Relevance
“…Furthermore, there are applications where the k-mer set is not related to sequence read data at all, e.g. a universal hitting set [26], a chromosome-specific reference dictionary [27], or a winnowed min-hash sketch (for example as in [28], or see [29,30] for a survey).…”
Section: Related Workmentioning
confidence: 99%
“…Furthermore, there are applications where the k-mer set is not related to sequence read data at all, e.g. a universal hitting set [26], a chromosome-specific reference dictionary [27], or a winnowed min-hash sketch (for example as in [28], or see [29,30] for a survey).…”
Section: Related Workmentioning
confidence: 99%
“…To generate the consensus sequence for each sample, all reads were first clustered using isONclust v0.0.4 (Sahlin & Medvedev, 2018). We chose isONclust over other clustering tools previously used in nanopore-based DNA barcoding pipelines, such as VSEARCH (implemented in ONTrack, Maestri et al, 2019), as it was specifically designed to work with error-prone longread data and thus should be less affected by read errors and more efficient in cluster formation.…”
Section: Read Clustering and Consensus Sequence Generationmentioning
confidence: 99%
“…While these algorithms continue to be widely leveraged in bioinformatics, they are even more prevalent for long-read (PacBio/Oxford Nanopore) analyses because longer strings are more amenable to compaction. As such, several long-read based mappers (Li, 2016;Popic and Batzoglou, 2017;Jain et al, 2018;Li, 2018), genome assemblers (Berlin et al, 2015;Koren et al, 2017;Chin and Khalak, 2019;Shafin et al, 2019;Kundu et al, 2019), metagenomic read classifiers (Dilthey et al, 2019), transcriptomic tools (Sahlin and Medvedev, 2019;Sahlin et al, 2020) employ either minimizer-or MinHash-based sequence comparison.…”
Section: Introductionmentioning
confidence: 99%