Heterogeneous hardware/software acceleration of the BWA-MEM DNA alignment algorithm

Ahmed, Nauman; Sima, Vlad-Mihai; Houtgast, Ernst; Bertels, Koen; Al-Ars, Zaid

doi:10.1109/iccad.2015.7372576

Cited by 45 publications

(28 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…By offloading the computational bottleneck onto Virtex-7 XC7VX690T-2 FPGA, the entire system can deliver a total acceleration of about 45%. This work is later extended by Ahmed et al [91] where a hardware suffix array is used to partially accelerate SMEM generation, which enables a total application acceleration of 2.6× compared to the original software version.…”

Section: Mappingmentioning

confidence: 99%

Reconfigurable acceleration of genetic sequence alignment: A survey of two decades of efforts

Liu

Luk

2017

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

View full text Add to dashboard Cite

Abstract-Genetic sequence alignment has always been a computational challenge in bioinformatics. Depending on the problem size, software-based aligners can take multiple CPUdays to process the sequence data, creating a bottleneck point in bioinformatic analysis flow. Reconfigurable accelerator can achieve high performance for such computation by providing massive parallelism, but at the expense of programming flexibility and thus has not been commensurately used by practitioners. Therefore, this paper aims to provide a thorough survey of the proposed accelerators by giving a qualitative categorization based on their algorithms and speedup. A comprehensive comparison between work is also presented so as to guide selection for biologist, and to provide insight on future research direction for FPGA scientists.

show abstract

Section: Mappingmentioning

confidence: 99%

Reconfigurable acceleration of genetic sequence alignment: A survey of two decades of efforts

Liu

Luk

2017

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

View full text Add to dashboard Cite

show abstract

“…To the authors' knowledge, only a few accelerated implementations of BWA-MEM exist: two FPGA implementations of BWA-MEM on the Convey supercomputing platform: one offloading the Seed Extension phase onto four Xilinx Virtex-6 FPGAs [4] obtaining a 1.5x speedup, the other accelerating multiple BWA-MEM phases [1] obtaining a 2.6x speedup; and a GPU-accelerated implementation of the Seed Extension phase [5], achieving a 1.6x speedup. This work improves upon [5], obtaining far better results: a two-fold speedup for a system with up to twenty-two logical cores is obtained, compared to an at most 1.6x speedup for a system with up to four cores.…”

Section: Related Workmentioning

confidence: 99%

“…To achieve this, it makes use of the Seed-and-Extend paradigm (refer to Figure 1), a two-step method consisting of an Exact Matching phase and an Inexact Matching phase (for details, see [1]). First, for each short read Seed Generation is performed: exactly matching subsequences of the read and reference called seeds are identified using a Burrows-Wheeler Transform-based index.…”

Section: The Bwa-mem Algorithmmentioning

confidence: 99%

An Efficient GPUAccelerated Implementation of Genomic Short Read Mapping with BWAMEM

Houtgast

Sima²,

Bertels

et al. 2017

SIGARCH Comput. Archit. News

Self Cite

View full text Add to dashboard Cite

Next Generation Sequencing techniques have resulted in an exponential growth in the generation of genetics data, the amount of which will soon rival, if not overtake, other Big Data fields, such as astronomy and streaming video services. To become useful, this data requires processing by a complex pipeline of algorithms, taking multiple days even on large clusters. The mapping stage of such genomics pipelines, which maps the short reads onto a reference genome, takes up a significant portion of execution time. BWA-MEM is the de-facto industry-standard for the mapping stage.Here, a GPU-accelerated implementation of BWA-MEM is proposed. The Seed Extension phase, one of the three main BWA-MEM algorithm phases that requires between 30%-50% of overall processing time, is offloaded onto the GPU. A thorough design space analysis is presented for an optimized mapping of this phase onto the GPU. The resulting systolic-array based implementation obtains a twofold overall application-level speedup, which is the maximum theoretically achievable speedup. Moreover, this speedup is sustained for systems with up to twenty-two logical cores. Based on the findings, a number of suggestions are made to improve GPU architecture, resulting in potentially greatly increased performance for bioinformatics-class algorithms.

show abstract

“…To our knowledge the only application-level accelerated integrated implementations of BWA-MEM that exist are: an FPGA-accelerated implementation of the Seed Extension phase [15] achieving a 1.5x speedup, further improved in [16] for an overall 2.6x speedup; and a GPU implementation [9], further improved to achieve an up to 2x speedup [17]. The FPGA implementation used here builds on [15], and a comparison of the implementation here is made to the improved GPU implementation.…”

Section: Related Workmentioning

confidence: 99%

“…A significant difference to the design in [15] and [16] is the fact that the Alpha Data card used here contains only a single Virtex-7 FPGA, whereas [15] and [16] use the Convey HC-2 EX as implementation platform, which contains four userconfigurable Virtex-6 FPGAs. As the design here is limited by the amount of LUTs available, and the Virtex-7 FPGA on the Alpha Data card contains 432,368 LUTs versus 474,240 LUTs per Virtex-6 FPGA on the Convey, this means only about 23% of the resources are available as compared to the Convey platform.…”

Section: A Fpga Design and Implementationmentioning

confidence: 99%

Power-efficiency analysis of accelerated BWA-MEM implementations on heterogeneous computing platforms

Houtgast

Sima²,

Marchiori³

et al. 2016

2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

Self Cite

View full text Add to dashboard Cite

Abstract-Next Generation Sequencing techniques have dramatically reduced the cost of sequencing genetic material, resulting in huge amounts of data being sequenced. The processing of this data poses huge challenges, both from a performance perspective, as well as from a power-efficiency perspective. Heterogeneous computing can help on both fronts, by enabling more performant and more power-efficient solutions.In this paper, power-efficiency of the BWA-MEM algorithm, a popular tool for genomic data mapping, is studied on two heterogeneous architectures. The performance and powerefficiency of an FPGA-based implementation using a single Xilinx Virtex-7 FPGA on the Alpha Data add-in card is compared to a GPU-based implementation using an NVIDIA GeForce GTX 970 and against the software-only baseline system. By offloading the Seed Extension phase on an accelerator, both implementations are able to achieve a two-fold speedup in overall application-level performance over the software-only implementation. Moreover, the highly customizable nature of the FPGA results in much higher power-efficiency, as the FPGA power consumption is less than one fourth of that of the GPU. To facilitate platform and tool-agnostic comparisons, the base pairs per Joule unit is introduced as a measure of power-efficiency. The FPGA design is able to map up to 44 thousand base pairs per Joule, a 2.1x gain in power-efficiency as compared to the software-only baseline.

show abstract

Heterogeneous hardware/software acceleration of the BWA-MEM DNA alignment algorithm

Cited by 45 publications

References 14 publications

Reconfigurable acceleration of genetic sequence alignment: A survey of two decades of efforts

Reconfigurable acceleration of genetic sequence alignment: A survey of two decades of efforts

An Efficient GPUAccelerated Implementation of Genomic Short Read Mapping with BWAMEM

Power-efficiency analysis of accelerated BWA-MEM implementations on heterogeneous computing platforms

Contact Info

Product

Resources

About