Coordinating Computation and I/O in Massively Parallel Sequence Search

Lin, Heshan; Ma, Xiaosong; Feng, Wu-chun; Samatova, Nagiza F.

doi:10.1109/tpds.2010.101

Cited by 46 publications

(35 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…NCBI BLAST+ [8] uses pthreads to speedup BLAST on a multicore CPU. On CPU clusters, TurboBLAST [24], ScalaBLAST [10], and mpiBLAST [14] have been proposed. Among them, mpiBLAST is a widely-used one based on NCBI BLAST.…”

Section: Related Workmentioning

confidence: 99%

cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on CPU+GPU

Zhang

Wang

Feng

2017

IEEE/ACM Trans. Comput. Biol. and Bioinf.

View full text Add to dashboard Cite

Abstract-BLAST, short for Basic Local Alignment Search Tool, is a ubiquitous tool used in the life sciences for pairwise sequence search. However, with the advent of next-generation sequencing (NGS), whether at the outset or downstream from NGS, the exponential growth of sequence databases is outstripping our ability to analyze the data. While recent studies have utilized the graphics processing unit (GPU) to speedup the BLAST algorithm for searching protein sequences (i.e., BLASTP), these studies use coarse-grained parallelism, where one sequence alignment is mapped to only one thread. Such an approach does not efficiently utilize the capabilities of a GPU, particularly due to the irregularity of BLASTP in both execution paths and memory-access patterns. To address the above shortcomings, we present a fine-grained approach to parallelize BLASTP, where each individual phase of sequence search is mapped to many threads on a GPU. This approach, which we refer to as cuBLASTP, reorders data-access patterns and reduces divergent branches of the most time-consuming phases (i.e., hit detection and ungapped extension). In addition, cuBLASTP optimizes the remaining phases (i.e., gapped extension and alignment with trace back) on a multicore CPU and overlaps their execution with the phases running on the GPU.

show abstract

Section: Related Workmentioning

confidence: 99%

cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on CPU+GPU

Zhang

Wang

Feng

2017

IEEE/ACM Trans. Comput. Biol. and Bioinf.

View full text Add to dashboard Cite

show abstract

“…Improvements in its execution speed will result in significant impact in the practice of genome studies. Therefore, important efforts have been invested in accelerating it for different computers systems (to cite a few, mpiBLAST [6,12], CloudBLAST [28], AzureBlast [29], GPU-Blast [30], and scalaBLAST 2.0 [31]). These Blast parallelisations require computer expertise to produce and adapt a particular Blast code and are tightly bonded to the software version included in the parallelised/distributed code [31].…”

Section: Related Workmentioning

confidence: 99%

“…Paired sequence comparison is inherently a parallel process in which many sequence pairs can be analysed at the same time by means of functions or algorithms that are iteratively performed over sequences. This is impelling the parallelisation of sequence comparison algorithms [5][6][7][8][9] as well as other bioinformatic algorithms [10,11].…”

Section: Introductionmentioning

confidence: 99%

SCBI_MapReduce, a New Ruby Task-Farm Skeleton for Automated Parallelisation and Distribution in Chunks of Sequences: The Implementation of a Boosted Blast+

Guerrero-Fernández

Falgueras

Claros

2013

Computational Biology Journal

View full text Add to dashboard Cite

Current genomic analyses often require the managing and comparison of big data using desktop bioinformatic software that was not developed regarding multicore distribution. The task-farm SCBI MapReduce is intended to simplify the trivial parallelisation and distribution of new and legacy software and scripts for biologists who are interested in using computers but are not skilled programmers. In the case of legacy applications, there is no need of modification or rewriting the source code. It can be used from multicore workstations to heterogeneous grids. Tests have demonstrated that speed-up scales almost linearly and that distribution in small chunks increases it. It is also shown that SCBI MapReduce takes advantage of shared storage when necessary, is faulttolerant, allows for resuming aborted jobs, does not need special hardware or virtual machine support, and provides the same results than a parallelised, legacy software. The same is true for interrupted and relaunched jobs. As proof-of-concept, distribution of a compiled version of Blast+ in the SCBI Distributed Blast gem is given, indicating that other blast binaries can be used while maintaining the same SCBI Distributed Blast code. Therefore, SCBI MapReduce suits most parallelisation and distribution needs in, for example, gene and genome studies.

show abstract

“…On multi-core platforms, the BLAST implementation from National Center for Biotechnology Information (NCBI) has been parallelized with pthreads. On cluster platforms, there are parallel implementations such as TurboBLAST [4], ScalaBLAST [24], and mpiBLAST [6], [13], [14]. Among them, mpiBLAST is a widely used parallelization of NCBI BLAST.…”

Section: Related Workmentioning

confidence: 99%

Accelerating Protein Sequence Search in a Heterogeneous Computing System

Xiao

Lin

Feng

2011

2011 IEEE International Parallel &Amp; Distributed Processing Symposium

Self Cite

View full text Add to dashboard Cite

Abstract-The "Basic Local Alignment Search Tool" (BLAST)is arguably the most widely used computational tool in bioinformatics. However, the computational power required for routine BLAST analysis has been outstripping Moore's Law due to the exponential growth in the size of the genomic sequence databases that BLAST searches on.To address the above issue, we propose the design and optimization of the BLAST algorithm for searching protein sequences (i.e., BLASTP) in a heterogeneous computing system. The end result is a BLASTP implementation that delivers a seven-fold speedup over the sequential BLASTP for the most computationally intensive phase (i.e., hit detection and ungapped extension) on a NVIDIA Fermi C2050 GPU. In addition, when pipelining the processing on a dual-core CPU and the NVIDIA Fermi GPU, our implementation can achieve a six-fold speedup for the overall program execution.

show abstract

Coordinating Computation and I/O in Massively Parallel Sequence Search

Cited by 46 publications

References 42 publications

cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on CPU+GPU

cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on CPU+GPU

SCBI_MapReduce, a New Ruby Task-Farm Skeleton for Automated Parallelisation and Distribution in Chunks of Sequences: The Implementation of a Boosted Blast+

Accelerating Protein Sequence Search in a Heterogeneous Computing System

Contact Info

Product

Resources

About