2014 IEEE International Parallel &Amp; Distributed Processing Symposium Workshops 2014
DOI: 10.1109/ipdpsw.2014.67
|View full text |Cite
|
Sign up to set email alerts
|

Parallelization of the Trinity Pipeline for De Novo Transcriptome Assembly

Abstract: This paper details a distributed-memory implementation of Chrysalis, part of the popular Trinity workflow used for de novo transcriptome assembly. We have implemented changes to Chrysalis, which was previously multi-threaded for sharedmemory architectures, to change it to a hybrid implementation which uses both MPI and OpenMP. With the new hybrid implementation, we report speedups of about a factor of twenty for both GraphFromFasta and ReadsToTranscripts on an iDataPlex cluster for a sugarbeet dataset containi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
4
0

Year Published

2014
2014
2017
2017

Publication Types

Select...
2
2
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 19 publications
1
4
0
Order By: Relevance
“…Inchworm initially creates a large hashmap table to store all unique k-mers from the input RNA-seq reads, and then it selects k-mers from the hashmap to construct linear contigs using a greedy k-mer extension approach. In our previous study [28], we confirmed that the Inchworm module of Trinity requires relatively high physical memory usage.…”
Section: Introductionsupporting
confidence: 72%
“…Inchworm initially creates a large hashmap table to store all unique k-mers from the input RNA-seq reads, and then it selects k-mers from the hashmap to construct linear contigs using a greedy k-mer extension approach. In our previous study [28], we confirmed that the Inchworm module of Trinity requires relatively high physical memory usage.…”
Section: Introductionsupporting
confidence: 72%
“…Contigs from all clusters are pooled together and passed to the Chrysalis module for re-clustering according to the original Trinity scheme. The Inchworm module of Trinity is known to be the most memory-intensive step [ 28 ], and is often a barrier to processing large or complex RNA-Seq datasets. In our scheme, the computational load is passed to the pre-clustering step, where the well-established MapReduce procedure allows the load to be distributed over a commodity compute cluster.…”
Section: Discussionmentioning
confidence: 99%
“…Butterfly then reconstructs the full-length transcripts based on the de Bruijn graphs from Chrysalis, taking into account possible alternative splicing . In our previous study [ 28 ], we identified the Chrysalis module as the main bottleneck in terms of runtime, and alleviated this bottleneck by parallelising the processing over multiple compute nodes using MPI. We also confirmed that the Inchworm module of Trinity requires relatively high physical memory usage.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Due to enormous volume of the data, transcriptome assembly is complex and requires a lot of computational time and resources e.g. only 10's of GB of data can take days to compute a transcriptome assembly [23] and can easily reach peta-byte level [24]. These NGS datasets have the inherent problems of storage and transmission due to their large volume and velocity.…”
Section: A Big Ngs Data and Computational Challengesmentioning
confidence: 99%