Retrieving Smith-Waterman Alignments with Optimizations for Megabase Biological Sequences Using GPU

Summary Biological sequence comparison is often used as an auxiliary task in the analysis of genetic material. Pairwise comparison algorithms like Smith‐Waterman evaluate two strings representing sequences of proteins, DNA or RNA to obtain optimal alignment between them. Many applications have been proposed to address the sequence comparison problem, prioritizing the use of graphics cards and proprietary languages such as CUDA. In this paper, we propose and evaluate MASA‐OpenCL, an OpenCL solution for comparing long DNA sequences that is based on the MASA sequence alignment framework, with pruning capability proportional to the similarity of the sequences compared. The results of MASA‐OpenCL were compared to its CUDA counterpart (MASA‐CUDAlign) and, in most cases, MASA‐OpenCL achieved better performance. In order to better understand the behavior of MASA‐OpenCL, we performed a statistical analysis considering 11 comparisons of sequences with high, medium and low similarity in 4 GPUs. As a result, we obtained a multiple linear regression model that considers (a) the sizes of the sequences, (b) the similarity between them, (c) the computational power of the GPU, and (d) the GPU memory bandwidth. We used this model to predict the performance in two other GPUs, with low error rates.

Section: Introductionmentioning

confidence: 66%

Section: Pairwise Sequence Alignmentmentioning

confidence: 99%

Section: Pairwise Sequence Alignmentmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

MASA‐OpenCL: Parallel pruned comparison of long DNA sequences with OpenCL

Figueiredo

Sandes

Rodrigues

et al. 2018

GSWABE: faster GPU‐accelerated sequence alignment with optimal alignment retrieval for short DNA sequences

Liu

Schmidt

2014

In this paper, we present GSWABE, a graphics processing unit (GPU)-accelerated pairwise sequence alignment algorithm for a collection of short DNA sequences. This algorithm supports all-to-all pairwise global, semi-global and local alignment, and retrieves optimal alignments on Compute Unified Device Architecture (CUDA)-enabled GPUs. All of the three alignment types are based on dynamic programming and share almost the same computational pattern. Thus, we have investigated a general tile-based approach to facilitating fast alignment by deeply exploring the powerful compute capability of CUDA-enabled GPUs. The performance of GSWABE has been evaluated on a Kepler-based Tesla K40 GPU using a variety of short DNA sequence datasets. The results show that our algorithm can yield a performance of up to 59.1 billions cell updates per second (GCUPS), 58.5 GCUPS and 50.3 GCUPS for global, semi-global and local alignment, respectively. Furthermore, on the same system GSWABE runs up to 156.0 times faster than the Streaming SIMD Extensions (SSE)-based SSW library and up to 102.4 times faster than the CUDA-based MSA-CUDA (the first stage) in terms of local alignment. Compared with the CUDA-based gpu-pairAlign, GSWABE demonstrates stable and consistent speedups with a maximum speedup of 11.2, 10.7, and 10.6 for global, semi-global, and local alignment, respectively. GSWABE: GPU-ACCELERATED DNA SEQUENCE ALIGNMENT 959 especially for large-scale datasets. This has therefore driven a substantial amount of research to parallelize pairwise alignment on high-performance computing architectures ranging from looselycoupled to tightly-coupled ones, including clouds [12], clusters [13,14], and accelerators [15,16]. Among these architectures, accelerators, including single instruction multiple data (SIMD) vector processing units (VPUs) affiliated to CPUs, field programmable gate arrays (FPGAs), and general-purpose GPUs, have recently been the predominant techniques.The SIMD VPUs affiliated to CPUs are the most widely used techniques. Two general approaches have been investigated to meet the computational features of SIMD vectors: one is the inter-task (or inter-sequence) parallelization model and the other is the intra-task (or intra-sequence) model. The inter-task model performs multiple alignments in individual SIMD vectors with one vector lane computing one alignment (e.g. [17]). The intra-task model computes in parallel the alignment of a single sequence pair in vectors based on two computational patterns: vectorized computation parallel to minor diagonals in the alignment matrix [18] and vectorized computation parallel to the query sequence in a sequential [19] or striped [20] layout. The two kinds of models provide a general framework for other accelerators with SIMD VPUs, including Cell Broadband Engine and general-purpose GPUs. Few implementations [21,22] have been proposed on Cell Broadband Engine, and all of them are based on the intra-task model with the striped layout. On generalpurpose GPUs, open graphics library was initially...

A scalable parallel algorithm for global sequence alignment with customizable scoring scheme

Sadiq,

Yousaf

2023

SummarySequence alignment is a critical computational problem in various domains, including genomics, proteomics, and natural language processing. The Needleman‐Wunsch (NW) algorithm is a classical dynamic programming approach for finding the optimal global alignment between two sequences. However, its quadratic time and space complexity make it impractical for aligning large‐scale sequences, which are increasingly common in modern applications. In this article, we propose a parallel variation of the NW algorithm that enables scalable global sequence alignment with customizable scoring schemes. Our approach re‐formulates the dependencies in the NW algorithm to enable parallel execution, thereby leveraging the computational power of modern parallel architectures, such as graphics processing unit (GPU). Furthermore, our algorithm supports arbitrary linear scoring schemes, which allows us to use domain‐specific knowledge to improve alignment accuracy. We establish the correctness of our algorithm and evaluate its performance using real DNA and user trajectory sequences on GPUs. Our parallel algorithm has shown impressive results in our experiments, with a peak performance of 27.99 GCUPS (giga cell updates per second) and a maximum speedup of 48.18 times compared to the traditional sequential implementation. Additionally, our algorithm demonstrates remarkable scalability, enabling the alignment of sequences of any length while ensuring balanced work distribution and optimal utilization of resources. Our primary objective is to harness the computational capabilities of a single GPU and fully utilize the processing power of multi‐core CPUs.