Preliminary results in accelerating profile HMM search on FPGAs

Jacob, Arpith C.; Lancaster, Joseph M.; Buhler, Jeremy; Chamberlain, Roger D.

doi:10.1109/ipdps.2007.370447

Cited by 22 publications

(20 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Numerous accelerators for HMMER and Smith-Waterman exist for which the evaluation time is decreased via a combination of architectural improvements, 8,[10][11][12] data path redesign, [13][14][15][16][17][18] and heuristics. 19,20 To our best knowledge, none of the accelerators mentioned above do rely exclusively on algorithmic improvement techniques as we do. Thus our approach can be considered highly platform independent.…”

Section: Discussion and Related Workmentioning

confidence: 96%

Rolling partial prefix-sums to speedup evaluation of uniform and affine recurrence equations

Ganesan

Chamberlin

Buhler

et al. 2011

Modeling and Simulation for Defense Systems and Applications VI

View full text Add to dashboard Cite

As multithreaded and reconfigurable logic architectures play an increasing role in high-performance computing (HPC), the scientific community is in need for new programming models for efficiently mapping existing applications to the new parallel platforms. In this paper, we show how we can effectively exploit tightly coupled fine-grained parallelism in architectures such as GPU and FPGA to speedup applications described by uniform recurrence equations. We introduce the concept of rolling partial-prefix sums to dynamically keep track of and resolve multiple dependencies without having to evaluate intermediary values. Rolling partial-prefix sums are applicable in low-latency evaluation of dynamic programming problems expressed as uniform or affine equations. To assess our approach, we consider two common problems in computational biology, hidden Markov models (HMMER) for protein motif finding and the Smith-Waterman algorithm. We present a platform independent, linear time solution to HMMER, which is traditionally solved in bilinear time, and a platform independent, sub-linear time solution to Smith-Waterman, which is normally solved in linear time.Keywords: Dynamic Programming, HMMER, Protein-Motif Finding, GPUs, Parallelization, Computational Biology GENERAL ROLLING PARTIAL PREFIX-SUMS ALGORITHMLet D be a finite domain of points. Each point can corresponds to a unique sub-problem or cell in a dynamic programming matrix. Let F be a function from D to a "result" domain Σ (e.g., the real numbers) that corresponds to the computation of the cost point in D. We seek to compute the values F (d) for a point d ∈ D. Let (Σ, ∧) form a commutative semigroup, i.e., the operator ∧ is a commutative, associative binary operator on results.Suppose that F (d) is computable for any d ∈ D as follows:where the summary is the natural extension of ∧ from two to any nonzero number of arguments. The summary operator maps two or more values in the results domain into a single value in the same domain. The function f i (d) is a mapping from multiple (finite number of) points in the domain D to one element in the results domain Σ. Here we consider only monadic recurrences where the function can be written as follows:where ⊕ is a binary extension operator on the results F (d ′ ) and h i (d) ∈ Σ is a "local" function that depends only on d, such as a look-up table and can be computed without the knowledge of any F (d). The relation d ′ < d must be satisfied, according to a partial order <, in order to avoid cyclic dependencies. The minimal elements of the partial order are "base" cases. A subset B of D is said to be "sufficient" for d if every path of dependency from d back to the base cases passes through an element of B. The nature of this dependency imposes a sequential execution of function F as dictated by the partial order. Therefore the number of algorithmic time-steps for sequential execution grows as the size of domain D modulo <, i.e., equal to total number of sets in D such that any two elements from different sets follow t...

show abstract

Section: Discussion and Related Workmentioning

confidence: 96%

Rolling partial prefix-sums to speedup evaluation of uniform and affine recurrence equations

Ganesan

Chamberlin

Buhler

et al. 2011

Modeling and Simulation for Defense Systems and Applications VI

View full text Add to dashboard Cite

show abstract

“…Early proposals [12][13][14][15] of hardware accelerator for profile based similarity search considered an (over) simplified version of the algorithm in which the feed-back loop is ignored as such a simplification has a relatively limited impact on the actual quality of results of the algorithm.…”

Section: Early Implementationsmentioning

confidence: 99%

Combining execution pipelines to improve parallel implementation of HMMER on FPGA

Abbas

Derrien

Rajopadhye

et al. 2015

Microprocessors and Microsystems

View full text Add to dashboard Cite

International audienceHMMER is a widely used tool in bioinformatic, based on the Profile Hidden Markov Models. The computation kernels of HMMER, namely MSV and P7Viterbi are very compute intensive, and their data dependencies if interpreted naively, lead to a purely sequential execution. In this paper, we propose a original parallelization scheme for HMMER by rewriting the mathematical formulation, to expose hidden potential parallelization opportunities. Our parallelization scheme targets FPGA technology, and our architecture can achieve 10 times speedup compared with the latest HMMER3 SSE version, without compromising on the sensitivity of original algorithm

show abstract

“…A block diagram of a processing unit algorithm in parallel, the feedback path through J state is deleted in [9,10,11]. The states on the same dotted line in Fig.5 can be processed in parallel in this case.…”

Section: Figmentioning

confidence: 99%

“…In [8,9,10,11], simplified Viterbi algorithms are parallelized. These approaches realize high performance, but can leads to inaccuracies.…”

Section: Introductionmentioning

confidence: 99%

Accelerating HMMER search using FPGA

Takagi

Matsumoto

2009

2009 International Conference on Field Programmable Logic and Applications

View full text Add to dashboard Cite

This paper describes an implementation of HMMER with FPGA. HMMER is one of the most used software tools for sensitive profile HMM (Hidden Markov Model) searches of biological sequence databases. HMMER is a very cpuintensive program. In HMMER, the Viterbi algorithm, which is a quadratic dynamic programming algorithm, is used to align a profile HMM and a protein sequence. In the profile HMM, a feedback path from the end of the model to the beginning is allowed, and this loop makes it difficult to process the Viterbi algorithm in parallel. In our approach, the alignment is calculated speculatively in parallel, and when the feedback path is selected in the alignment, the alignment is recalculated from the beginning using the fedback score. According to our experiments, the ratio that the feedback path is selected is very low, and the performance loss by the recalculation is less than a few percent. Another problem for accelerating HMMER using FPGA is the large size of the score tables required for profile HMMs. By crossing the search direction in the quadratic search space and the moving direction of each processing unit in the search space, we can minimize the size of the memory banks for storing the score tables. This optimization technique makes it possible to process all profile HMMs in a database efficiently.

show abstract

Preliminary results in accelerating profile HMM search on FPGAs

Cited by 22 publications

References 15 publications

Rolling partial prefix-sums to speedup evaluation of uniform and affine recurrence equations

Rolling partial prefix-sums to speedup evaluation of uniform and affine recurrence equations

Combining execution pipelines to improve parallel implementation of HMMER on FPGA

Accelerating HMMER search using FPGA

Contact Info

Product

Resources

About