A Deterministic Finite Automaton for Faster Protein Hit Detection in BLAST

Cameron, Michael J.; Williams, Hugh E.; Cannane, Adam

doi:10.1089/cmb.2006.13.965

Cited by 32 publications

(25 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A major limitation when designing SPE kernels is that their local memory is only 256 kB for both instructions and data. Using default parameter for w and T the size of the lookup table used for Stage 1 by NCBI BLASTP is already around 400 kB for a query sequence of average length [5]. Therefore, we need to use an alternative data structure which requires significantly less memory.…”

Section: Parallelization Approachmentioning

confidence: 99%

“…Therefore, we are using a more memory-efficient data structure for Stage 1. The utilized data structure is a compressed deterministic finite-state automaton (DFA), which is similar to the approach used by FSA-BLAST [4,5]. The compressed DFA for w = 3 is illustrated in Fig.…”

Section: Blastpmentioning

confidence: 99%

“…nlm.nih.gov/BLAST/developer.shtml). FSA-BLAST uses an optimized sequential algorithm and is around 15% faster than NCBI-BLASTP with no loss in accuracy [4,5]. FSA-BLASTP and NCBI-BLASTP are tested on a HP workstation xw4200 with Dual-core Pentium ® 4 (P4) CPU 3 GHz, 2 GB of RAM.…”

Section: Blastpmentioning

confidence: 99%

See 2 more Smart Citations

High Performance Protein Sequence Database Scanning on the Cell Broadband Engine

Wirawan

Schmidt

Zhang

et al. 2009

Scientific Programming

View full text Add to dashboard Cite

Abstract. The enormous growth of biological sequence databases has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing rapidly as well. The recent emergence of low cost parallel multicore accelerator technologies has made it possible to reduce execution times of many bioinformatics applications. In this paper, we demonstrate how the Cell Broadband Engine can be used as a computational platform to accelerate two approaches for protein sequence database scanning: exhaustive and heuristic. We present efficient parallelization techniques for two representative algorithms: the dynamic programming based Smith-Waterman algorithm and the popular BLASTP heuristic. Their implementation on a Playstation ® 3 leads to significant runtime savings compared to corresponding sequential implementations.

show abstract

Section: Parallelization Approachmentioning

confidence: 99%

Section: Blastpmentioning

confidence: 99%

See 1 more Smart Citation

High Performance Protein Sequence Database Scanning on the Cell Broadband Engine

Wirawan

Schmidt

Zhang

et al. 2009

Scientific Programming

View full text Add to dashboard Cite

show abstract

“…Therefore, we are using a more memory-efficient data structure for Stage 1. The utilized data structure is a compressed deterministic finite-state automaton (DFA), which is similar to the approach used by FSA-BLAST [3,4]. The compressed DFA for w=3 is illustrated in Figure 4.…”

Section: Data Transfer and Coordination Between Ppe And Spes The Difmentioning

confidence: 99%

Accelerating BLASTP on the Cell Broadband Engine

Zhang

Schmidt

Müller‐Wittig

2008

Pattern Recognition in Bioinformatics

View full text Add to dashboard Cite

show abstract

“…Deterministic finite-state automata (DFA) have applications in natural language processing (Roche and Shabes, 1997), medical data analysis (Lewis et al, 2010), network intrusion detection (Tuck et al, 2004), computational biology (Cameron et al, 2005), and other fields. Although DFA are less compact than their non-deterministic counterpart, they are easier to work with algorithmically, and their uniform membership problem, when also the language model is part of the input, can be decided in time O(|w| log |Q|), where w is the input string and Q the state space.…”

Section: Introductionmentioning

confidence: 99%

Compression of finite-state automata through failure transitions

Björklund

Zechner

2014

Theoretical Computer Science

View full text Add to dashboard Cite

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. Compression of finite AbstractSeveral linear-time algorithms for automata-based pattern matching rely on failure transitions for efficient back-tracking. Like epsilon transitions, failure transition do not consume input symbols, but unlike them, they may only be taken when no other transition is applicable. At a semantic level, this conveniently models catch-all clauses and allows for compact language representation. This work investigates the transition-reduction problem for deterministic finite-state automata (DFA). The input is a DFA A and an integer k. The question is whether k or more transitions can be saved by replacing regular transitions with failure transitions. We show that while the problem is NP -complete, there are approximation techniques and heuristics that mitigate the computational complexity. We conclude by demonstrating the computational difficulty of two related minimisation problems, thereby cancelling the ongoing search for efficient algorithms.

show abstract

A Deterministic Finite Automaton for Faster Protein Hit Detection in BLAST

Cited by 32 publications

References 21 publications

High Performance Protein Sequence Database Scanning on the Cell Broadband Engine

High Performance Protein Sequence Database Scanning on the Cell Broadband Engine

Accelerating BLASTP on the Cell Broadband Engine

Compression of finite-state automata through failure transitions

Contact Info

Product

Resources

About