2006
DOI: 10.1089/cmb.2006.13.965
|View full text |Cite
|
Sign up to set email alerts
|

A Deterministic Finite Automaton for Faster Protein Hit Detection in BLAST

Abstract: BLAST is the most popular bioinformatics tool and is used to run millions of queries each day. However, evaluating such queries is slow, taking typically minutes on modern workstations. Therefore, continuing evolution of BLAST-by improving its algorithms and optimizations-is essential to improve search times in the face of exponentially increasing collection sizes. We present an optimization to the first stage of the BLAST algorithm specifically designed for protein search. It produces the same results as NCBI… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0

Year Published

2006
2006
2019
2019

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 32 publications
(25 citation statements)
references
References 21 publications
0
24
0
Order By: Relevance
“…A major limitation when designing SPE kernels is that their local memory is only 256 kB for both instructions and data. Using default parameter for w and T the size of the lookup table used for Stage 1 by NCBI BLASTP is already around 400 kB for a query sequence of average length [5]. Therefore, we need to use an alternative data structure which requires significantly less memory.…”
Section: Parallelization Approachmentioning
confidence: 99%
See 2 more Smart Citations
“…A major limitation when designing SPE kernels is that their local memory is only 256 kB for both instructions and data. Using default parameter for w and T the size of the lookup table used for Stage 1 by NCBI BLASTP is already around 400 kB for a query sequence of average length [5]. Therefore, we need to use an alternative data structure which requires significantly less memory.…”
Section: Parallelization Approachmentioning
confidence: 99%
“…Therefore, we are using a more memory-efficient data structure for Stage 1. The utilized data structure is a compressed deterministic finite-state automaton (DFA), which is similar to the approach used by FSA-BLAST [4,5]. The compressed DFA for w = 3 is illustrated in Fig.…”
Section: Blastpmentioning
confidence: 99%
See 1 more Smart Citation
“…Therefore, we are using a more memory-efficient data structure for Stage 1. The utilized data structure is a compressed deterministic finite-state automaton (DFA), which is similar to the approach used by FSA-BLAST [3,4]. The compressed DFA for w=3 is illustrated in Figure 4.…”
Section: Data Transfer and Coordination Between Ppe And Spes The Difmentioning
confidence: 99%
“…Deterministic finite-state automata (DFA) have applications in natural language processing (Roche and Shabes, 1997), medical data analysis (Lewis et al, 2010), network intrusion detection (Tuck et al, 2004), computational biology (Cameron et al, 2005), and other fields. Although DFA are less compact than their non-deterministic counterpart, they are easier to work with algorithmically, and their uniform membership problem, when also the language model is part of the input, can be decided in time O(|w| log |Q|), where w is the input string and Q the state space.…”
Section: Introductionmentioning
confidence: 99%