The identification of direct targets of transcription factors is a key problem in the study of gene regulatory networks. However, the use of high throughput experimental methods, such as ChIP-chip and ChIP-sequencing, is limited by their high cost and strong dependence on cellular type and context. We developed a computational method for the genome-wide identification of functional transcription factor binding sites based on positional weight matrices, comparative genomics, and gene expression profiling. The method was applied to Stat3, a transcription factor playing crucial roles in inflammation, immunity and oncogenesis, and able to induce distinct subsets of target genes in different cell types or conditions. A newly generated positional weight matrix enabled us to assign affinity scores of high specificity, as measured by EMSA competition assays. Phylogenetic conservation with 7 vertebrate species was used to select the binding sites most likely to be functional. Validation was carried out on predicted sites within genes identified as differentially expressed in the presence or absence of Stat3 by microarray analysis. Twelve of the fourteen sites tested were bound by Stat3 in vivo, as assessed by Chromatin Immunoprecipitation, allowing us to identify 9 Stat3 transcriptional targets. Given its high validation rate, and the availability of large transcription factor-dependent gene expression datasets obtained under diverse experimental conditions, our approach appears to be a valid alternative to high-throughput experimental assays for the discovery of novel direct targets of transcription factors.chromatin immunoprecipitation ͉ phylogenetic footprint ͉ positional weight matrix ͉ Stat3 binding sites ͉ Stat3 target genes F unctional transcription factor binding sites (TFBSs) can be identified on a genomic scale either by computational approaches or through elaborated procedures such as chromatin immunoprecipitation followed by either genomic microchip hybridization (ChIP on Chip) or deep sequencing (ChIP and Sequencing) (1). These have the advantage of directly measuring the in vivo occupancy of genomic sites. By definition however, each experiment will only be able to identify sites bound under the specific conditions analyzed, i.e., separate experiments will have to be performed for each condition/tissue type of interest, and this will be particularly true for the many transcription factors (TF) that are known to induce distinct sets of genes in different tissues. Indeed, sets of TFBSs identified with these techniques in different conditions often show limited overlap. The predictions based on computational sequence analysis (2), however, are in principle independent of the cellular context. The ample collection of candidate BSs thus produced will then be available to identify transcriptional targets either as such or within lists of differentially expressed genes generated by microarray experiments, in many cases already available through public databases.The standard way to describe degenerate cis-regulator...