Establishing transcriptional regulatory networks by analysis of gene expression data and promoter sequences shows great promise. We developed a novel promoter classification method using a Relevance Vector Machine (RVM) and Bayesian statistical principles to identify discriminatory features in the promoter sequences of genes that can correctly classify transcriptional responses. The method was applied to microarray data obtained from Arabidopsis seedlings treated with glucose or abscisic acid (ABA). Of those genes showing >2.5-fold changes in expression level, ~70% were correctly predicted as being up- or down-regulated (under 10-fold cross-validation), based on the presence or absence of a small set of discriminative promoter motifs. Many of these motifs have known regulatory functions in sugar- and ABA-mediated gene expression. One promoter motif that was not known to be involved in glucose-responsive gene expression was identified as the strongest classifier of glucose-up-regulated gene expression. We show it confers glucose-responsive gene expression in conjunction with another promoter motif, thus validating the classification method. We were able to establish a detailed model of glucose and ABA transcriptional regulatory networks and their interactions, which will help us to understand the mechanisms linking metabolism with growth in Arabidopsis. This study shows that machine learning strategies coupled to Bayesian statistical methods hold significant promise for identifying functionally significant promoter sequences
Six statistical and two dynamical downscaling models were compared with regard to their ability to downscale seven seasonal indices of heavy precipitation for two station networks in northwest and southeast England. The skill among the eight downscaling models was high for those indices and seasons that had greater spatial coherence. Generally, winter showed the highest downscaling skill and summer the lowest. The rainfall indices that were indicative of rainfall occurrence were better modelled than those indicative of intensity. Models based on non-linear artificial neural networks were found to be the best at modelling the inter-annual variability of the indices; however, their strong negative biases implied a tendency to underestimate extremes. A novel approach used in one of the neural network models to output the rainfall probability and the gamma distribution scale and shape parameters for each day meant that resampling methods could be used to circumvent the underestimation of extremes. Six of the models were applied to the Hadley Centre global circulation model HadAM3P forced by emissions according to two SRES scenarios. This revealed that the inter-model differences between the future changes in the downscaled precipitation indices were at least as large as the differences between the emission scenarios for a single model. This implies caution when interpreting the output from a single model or a single type of model (e.g. regional climate models) and the advantage of including as many different types of downscaling models, global models and emission scenarios as possible when developing climate-change projections at the local scale.
A MATLAB implementation of the sparse logistic regression algorithm with Bayesian regularization (BLogReg) is available from http://theoval.cmp.uea.ac.uk/~gcc/cbl/blogreg/
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.