2006
DOI: 10.1186/1471-2105-7-389
|View full text |Cite
|
Sign up to set email alerts
|

Fast index based algorithms and software for matching position specific scoring matrices

Abstract: BackgroundIn biological sequence analysis, position specific scoring matrices (PSSMs) are widely used to represent sequence motifs in nucleotide as well as amino acid sequences. Searching with PSSMs in complete genomes or large sequence databases is a common, but computationally expensive task.ResultsWe present a new non-heuristic algorithm, called ESAsearch, to efficiently find matches of PSSMs in large databases. Our approach preprocesses the search space, e.g., a complete genome or a set of protein sequence… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
119
0

Year Published

2008
2008
2022
2022

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 131 publications
(119 citation statements)
references
References 29 publications
0
119
0
Order By: Relevance
“…Combining these data sets yielded a group of candidate genes that are both conserved in sequenced corynebacteria and most likely under transcriptional control by GlxR and its orthologs, respectively. The workflow is completed by performing electrophoretic mobility shift assays (EMSAs) for in vitro verification of detected binding sites in the model organism C. glutamicum ATCC 13032. raising sensitivity at the cost of diminished specificity and thereby increasing the number of detected motif instances (Beckstette et al, 2006). Subsequently, a bi-directional best blast analysis with an E-value threshold of 10 −5 was employed to identify potentially orthologous genes.…”
Section: Detection Of Highly Conserved Glxr Target Genes In Corynebacmentioning
confidence: 99%
See 1 more Smart Citation
“…Combining these data sets yielded a group of candidate genes that are both conserved in sequenced corynebacteria and most likely under transcriptional control by GlxR and its orthologs, respectively. The workflow is completed by performing electrophoretic mobility shift assays (EMSAs) for in vitro verification of detected binding sites in the model organism C. glutamicum ATCC 13032. raising sensitivity at the cost of diminished specificity and thereby increasing the number of detected motif instances (Beckstette et al, 2006). Subsequently, a bi-directional best blast analysis with an E-value threshold of 10 −5 was employed to identify potentially orthologous genes.…”
Section: Detection Of Highly Conserved Glxr Target Genes In Corynebacmentioning
confidence: 99%
“…DNA binding sites were detected using the PoSSuMsearch algorithm (Beckstette et al, 2006) with all upstream regions of the organism, extracted from the genomic sequence as described previously for C. glutamicum (Kohl et al, 2008). The position weight matrix (PWM)-based model of the GlxR binding motif used in the search was derived from all verified binding sites in C. glutamicum ATCC 13032.…”
Section: Detection Of Glxr Binding Sitesmentioning
confidence: 99%
“…Putative transcription factor-binding sites were identified using the PSSM search module of Biopython. The significance threshold for binding sites in the context of multiple-hypothesis testing was defined by computing the exact probability distributions for site scores under the PSSM and genomic background models with dynamic programming and controlling the rate of false-positive results by defining the probability of finding at least one false-positive result in a sequence of 350 bp (␣ 350 ϭ 0.01) (46,47). Comparative genomics analysis.…”
Section: T T T T a T T C A G T C T T A G A A T T G A T G C A G A T A mentioning
confidence: 99%
“…[39,44,51]), use a brute-force sliding window approach, and the profile matching problem is still regarded as a not yet satisfactorily well-solved problem in computational biology [20]. In recent years a bunch of advanced algorithms based on score properties [53], indexing data structures [6,7,13], Fast Fourier Transform [41], data compression [17], matrix partitioning [33], filtering algorithms [7,33,38], pattern matching [38], and superalphabet [38] have been proposed to reduce the expected time of computation. The aim of this paper is to survey these methods to give the reader an overview of the state of the art of the topic and possibly stimulate future research in the field.…”
Section: Introductionmentioning
confidence: 99%