2006
DOI: 10.1093/nar/gkl585
|View full text |Cite
|
Sign up to set email alerts
|

A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites

Abstract: Given a set of known binding sites for a specific transcription factor, it is possible to build a model of the transcription factor binding site, usually called a motif model, and use this model to search for other sites that bind the same transcription factor. Typically, this search is performed using a position-specific scoring matrix (PSSM), also known as a position weight matrix. In this paper we analyze a set of eukaryotic transcription factor binding sites and show that there is extensive clustering of s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
15
0

Year Published

2007
2007
2015
2015

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 21 publications
(15 citation statements)
references
References 32 publications
0
15
0
Order By: Relevance
“…There are many methods, as well as software tools, for modeling TFBSs in terms of PWMs (11–17), as well as more advanced techniques that consider dependencies between nucleotides in different positions (18), but the vast majority are based on the assumption that letter representations of DNA sequences suitably capture the physicochemical properties of DNA (and proteins) that govern the specificity of protein–DNA interactions. However, the general validity of this assumption is questionable (19).…”
Section: Introductionmentioning
confidence: 99%
“…There are many methods, as well as software tools, for modeling TFBSs in terms of PWMs (11–17), as well as more advanced techniques that consider dependencies between nucleotides in different positions (18), but the vast majority are based on the assumption that letter representations of DNA sequences suitably capture the physicochemical properties of DNA (and proteins) that govern the specificity of protein–DNA interactions. However, the general validity of this assumption is questionable (19).…”
Section: Introductionmentioning
confidence: 99%
“…While requiring perfect conservation across many genomes is of limited use, the increased power enables approaches that account for artifacts in sequencing, assembly and alignment, and tolerate diverged, missing, or moved motif instances. Our BLS measure is more generally applicable to PWMs (Stormo 2000), to more complex models of regulatory motifs that account for dependencies between individual motif positions (Yada et al 1998;Naughton et al 2006), and to more advanced rules for miRNAtarget recognition that for example score the contribution of the 3Јpairing energy (Stark et al 2003;Brennecke et al 2005).…”
Section: Discussionmentioning
confidence: 99%
“…We also note that lasso regression has been used elsewhere for learning regulatory networks in bacteria using time course expression data [21] , and standard PLS has been used with a collection of known motifs in linear modeling of expression data in yeast and bacteria [22] . Finally, graph-based motif representations have been used previously by other groups, for example Naughton et al [23] , but this work again falls into the “cluster-first” category in that it seeks to find overrepresented motifs for a predefined gene set. By contrast, we learn motifs via a global regression problem, and the graph structure is encoded as a constraint on the solution.…”
Section: Discussionmentioning
confidence: 99%