2014
DOI: 10.1186/1471-2105-15-s15-s7
|View full text |Cite
|
Sign up to set email alerts
|

EasyCluster2: an improved tool for clustering and assembling long transcriptome reads

Abstract: BackgroundExpressed sequences (e.g. ESTs) are a strong source of evidence to improve gene structures and predict reliable alternative splicing events. When a genome assembly is available, ESTs are suitable to generate gene-oriented clusters through the well-established EasyCluster software. Nowadays, EST-like sequences can be massively produced using Next Generation Sequencing (NGS) technologies. In order to handle genome-scale transcriptome data, we present here EasyCluster2, a reimplementation of EasyCluster… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
1
1
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 24 publications
0
2
0
Order By: Relevance
“…There exists a plethora of algorithms for de novo clustering of generic nucleotide- [13][14][15][16], and protein-sequences [17,14,18,19]. Several algorithms have also been proposed for clustering of specific nucleotide data such as barcode sequences [20], EST sequences [21][22][23], full-length cDNA [24], RAD-seq [25], genomic or metagenomic short reads [26][27][28][29][30][31], UMI-tagged reads [32], full genomes and metagenomes [33], and contigs from RNA-seq assemblies [34]. However, our clustering problem has unique distinguishing characteristics: transcripts from the same gene have large indels due to alternative splicing, and the error rate and profile differs both between [2] and within [35] reads.…”
Section: Introductionmentioning
confidence: 99%
“…There exists a plethora of algorithms for de novo clustering of generic nucleotide- [13][14][15][16], and protein-sequences [17,14,18,19]. Several algorithms have also been proposed for clustering of specific nucleotide data such as barcode sequences [20], EST sequences [21][22][23], full-length cDNA [24], RAD-seq [25], genomic or metagenomic short reads [26][27][28][29][30][31], UMI-tagged reads [32], full genomes and metagenomes [33], and contigs from RNA-seq assemblies [34]. However, our clustering problem has unique distinguishing characteristics: transcripts from the same gene have large indels due to alternative splicing, and the error rate and profile differs both between [2] and within [35] reads.…”
Section: Introductionmentioning
confidence: 99%
“…Because de novo assembly usually results in an unexpectedly large number of putative transcripts (PT), also known as "transfrags", a significant portion of which are "transcriptional noise", redundant PTs, and assembly artifacts. The redundancy, noise and artifacts can be filtered out based on following the following steps: (1) clustering and collapsing by using tools such as cd-hit-est [118] and EasyCluster2 [119]; (2) removing intronic unspliced PTs after aligning them to the reference genome, if possible; (3) removing sequences that are significantly similar to those of distantly related species in the NCBI NT database; (4) filtering sequences that are poorly supported by RNA-seq reads (Liu H. unpublished). Quality of the resulting transcriptomes can be evaluated in reference-free and reference-dependent ways [105,120].…”
Section: Different Library Construction Methods Can Bring Different Bmentioning
confidence: 99%