The annotation of transcription binding sites in new sequenced\ud
genomes is an important and challenging problem. We have\ud
previously shown how a regression model that linearly relates gene expression\ud
levels to the matching scores of nucleotide patterns allows us\ud
to identify DNA-binding sites from a collection of co-regulated genes\ud
and their nearby non-coding DNA sequences. Our methodology uses\ud
Bayesian models and stochastic search techniques to select transcription\ud
factor binding site candidates. Here we show that this methodology\ud
allows us to identify binding sites in nearby species. We present examples\ud
of annotation crossing from Schizosaccharomyces pombe to Schizosaccharomyces\ud
japonicus. We found that the eng1 motif is also regulating a set\ud
of 9 genes in S. japonicus. Our framework may have an effective interest\ud
in conveying information in the annotation process of a new species. Finally\ud
we discuss a number of statistical and biological issues related to\ud
the identification of binding sites through covariates of genes expression\ud
and sequences
The identification of the genes that are coordinately regulated is an important and challenging task of bioinformatics and represents a first step in the elucidation of the topology of transcriptional networks. We first compare the performances, in a grid setting, of the Markov clustering algorithm with respect to the k-means using microarray test data sets. The gene expression information of the clustered genes can be used to annotate transcription binding sites upstream co-regulated genes. The methodology uses a regression model that relates gene expression levels to the matching scores of nucleotide patterns allowing us to identify DNA-binding sites from a collection of noncoding DNA sequences from co-regulated genes. Here we discuss extending the approach to multiple species exploiting the grid framework.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.