2003
DOI: 10.1093/nar/gkg540
|View full text |Cite
|
Sign up to set email alerts
|

Cluster-Buster: finding dense clusters of motifs in DNA sequences

Abstract: The signals that determine activation and repression of specific genes in response to appropriate stimuli are one of the most important, but least understood, types of information encoded in genomic DNA. The nucleotide sequence patterns, or motifs, preferentially bound by various transcription factors have been collected in databases. However, these motifs appear to be individually too short and degenerate to enable detection of functional enhancer and silencer elements within a large genome. Several groups ha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
302
0
2

Year Published

2006
2006
2016
2016

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 287 publications
(304 citation statements)
references
References 17 publications
0
302
0
2
Order By: Relevance
“…Motifs were selected based on their overall performance and low occurrence in the negative set (Supplemental Materials and Methods). For each obtained motif or the combination of all 10 motifs, Cluster-Buster (Frith et al 2003) was used to score the positive and negative sets using -c 0 and -m 0. The highest motif score (or CRM score) for each region was obtained and used to determine the predictive value of each motif to classify regions into positives or negatives.…”
Section: Library Preparationsmentioning
confidence: 99%
See 1 more Smart Citation
“…Motifs were selected based on their overall performance and low occurrence in the negative set (Supplemental Materials and Methods). For each obtained motif or the combination of all 10 motifs, Cluster-Buster (Frith et al 2003) was used to score the positive and negative sets using -c 0 and -m 0. The highest motif score (or CRM score) for each region was obtained and used to determine the predictive value of each motif to classify regions into positives or negatives.…”
Section: Library Preparationsmentioning
confidence: 99%
“…The quality of each model was estimated in fivefold cross validations. For each PWM, the motif score was calculated using a Hidden Markov Model as implemented in Cluster-Buster (Frith et al 2003). Number of coding genes and lncRNAs was calculated using BEDTools (Quinlan and Hall 2010) and a custom bash script.…”
Section: Random Forest Model and Feature-vector Representationmentioning
confidence: 99%
“…Given a chromosomal region and a genome build, TAR-Vis fetches the sequence region (including at least 1000 bp upstream and downstream in order to avoid boundary conditions on the subsequent calculations) from Ensembl's main databases and copies it to the local machine. From there, various calculations are run on the selected region, including Eponine TSS detection (Down and Hubbard 2002), Cluster-Buster (Frith et al 2003) transcription factor binding site detection (using the JASPAR TFBS database), CpG island detection, and G/C content graphing. Finally, all surrounding gene annotations are collected from Ensembl's annotation server.…”
Section: Tar-vismentioning
confidence: 99%
“…TF motifs have been used to produce a genomewide map of TF binding sites [139], and predicting CRMs based on their higher densities has been shown to be beneficial [140][141][142][143]. If the identity of TFs active in the cell type of interest and their motifs is known, the predictive power of the methods increases for that cell type [144][145][146][147][148][149][150]. In a complementary approach, the loci of genes with a similar function can be searched for common TF binding sites [151][152][153][154].…”
Section: Human Tsssmentioning
confidence: 99%
“…The approach there was to compile PWMs of known muscle (liver) TFs and use them to learn a logistic regression model to classify between muscle (liver) and non-muscle (non-liver) regulatory regions. Since then many methods have been developed that train a model based on TF motifs occurring in a set of CRMs to make novel predictions; in many cases, a set of motifs is needed to be provided by the user [149,150,[162][163][164], in others overrepresented motifs or words are learned de novo from the data [165][166][167][168][169]. Table 2 shows a description of several CRMdetection methods grouped according to the type of data they require.…”
Section: Human Tsssmentioning
confidence: 99%