The ArcAB two-component system of Escherichia coli regulates the aerobic/anaerobic expression of genes that encode respiratory proteins whose synthesis is coordinated during aerobic/anaerobic cell growth. A genomic study of E. coli was undertaken to identify other potential targets of oxygen and ArcA regulation. A group of 175 genes generated from this study and our previous study on oxygen regulation (Salmon, K., Hung, S. P., Mekjian, K., Baldi, P., Hatfield, G. W., and Gunsalus, R. P. (2003) J. Biol. Chem. 278, 29837-29855), called our gold standard gene set, have p values <0.00013 and a posterior probability of differential expression value of 0.99. These 175 genes clustered into eight expression patterns and represent genes involved in a large number of cell processes, including small molecule biosynthesis, macromolecular synthesis, and aerobic/anaerobic respiration and fermentation. In addition, 119 of these 175 genes were also identified in our previous study of the fnr allele. A MEME/weight matrix method was used to identify a new putative ArcA-binding site for all genes of the E. coli genome. 16 new sites were identified upstream of genes in our gold standard set. The strict statistical analyses that we have performed on our data allow us to predict that 1139 genes in the E. coli genome are regulated either directly or indirectly by the ArcA protein with a 99% confidence level.
Integration host factor (IHF) is a bacterial histone-like protein whose primary biological role is to condense the bacterial nucleoid and to constrain DNA supercoils. It does so by binding in a sequence-independent manner throughout the genome. However, unlike other structurally related bacterial histone-like proteins, IHF has evolved a sequence-dependent, high affinity DNA-binding motif. The high affinity binding sites are important for the regulation of a wide range of cellular processes. A remarkable feature of IHF is that it employs an indirect readout mechanism to bind and wrap DNA at both the nonspecific and high affinity (sequence-dependent) DNA sites. In this study we assessed the contributions of pre-formed and protein-induced DNA conformations to the energetics of IHF binding. Binding energies determined experimentally were compared with energies predicted for the IHF-induced deformation of the DNA helix (DNA deformation energy) in the IHF-DNA complex. Combinatorial sets of de novo DNA sequences were designed to systematically evaluate the influence of sequence-dependent structural characteristics of the conserved IHF recognition elements of the consensus DNA sequence. We show that IHF recognizes pre-formed conformational characteristics of the consensus DNA sequence at high affinity sites, whereas at all other sites relative affinity is determined by the deformational energy required for nearest-neighbor base pairs to adopt the DNA structure of the bound DNA-IHF complex.Site-specific DNA binding by regulatory proteins is a feature of the regulatory processes that maintain, expand, and express genetic information such as replication, recombination, transposition, and transcription. The chemical and physical mechanisms that underlie sequence-specific recognition of regulatory elements by cognate DNA-binding proteins are typically classified as direct versus indirect readout. The former refers primarily to hydrogen bonds between proteins and the unique extra-cyclic substituents at C-4 of pyrimidines, C-6 of purines, and N-7 of purines. These groups provide a base pair-specific pattern of hydrogen bond donors and acceptors in the major groove of DNA that can be directly read by a complementary pattern of amino acid side chain donors and acceptors. Indirect readout refers to recognition of aspects of DNA structure such as intrinsic curvature, topology of major and minor grooves, ordered water structures, local geometry of backbone phosphates, and flexibility or deformability. Because both the local DNA structure and energy to deform DNA are themselves intrinsic sequence-dependent properties, the conserved sequences that distinguish binding sites necessarily include contributions from both direct and indirect mechanisms. Consequently, although the contribution from indirect mechanisms is expected to be significant in protein-DNA complexes that feature substantial DNA deformation, it has proven difficult to evaluate these contributions quantitatively. A protein that relies exclusively, or primarily, on indir...
The starting structure motif was a crystal structure of DNA bound to the integration host factor protein (IHF) of E. coli. IHF is known to exhibit both direct and indirect recognition of its binding sites. (1) Threading DNA sequences onto the crystal structure showed statistically significant partial separation of 60 IHF binding sites from random and intragenic sequences and was positively correlated with binding affinity. (2) The crystal structure was shown to be equivalent to a linear Markov network, and so, to a joint probability distribution over sequences, computable in linear time. It was transformed algorithmically into several common pure-sequence representations, including (a) small sets of short exact strings, (b) weight matrices, (c) consensus regular patterns, (d) multiple sequence alignments, and (e) phylogenetic trees. In all cases the pure-sequence motifs retained statistically significant partial separation of the IHF binding sites from random and intragenic sequences. Most exhibited positive correlation with binding affinity. The multiple alignment showed some conserved columns, and the phylogenetic tree partially mixed low-energy sequences with IHF binding sites but separated high-energy sequences. The conclusion is that deformation energy explains part of indirect recognition, which explains part of IHF sequence-specific binding.
Proteins that bind to specific locations in genomic DNA control many basic cellular functions. Proteins detect their binding sites using both direct and indirect recognition mechanisms. Deformation energy, which models the energy required to bend DNA from its native shape to its shape when bound to a protein, has been shown to be an indirect recognition mechanism for one particular protein, Integration Host Factor (IHF). This work extends the analysis of deformation to two other DNA-binding proteins, CRP and SRF, and two endonucleases, I-CreI and I-PpoI. Known binding sites for all five proteins showed statistically significant differences in mean deformation energy as compared to random sequences. Binding sites for the three DNA-binding proteins and one of the endonucleases had mean deformation energies lower than random sequences. Binding sites for I-PpoI had mean deformation energy higher than random sequences. Classifiers that were trained using the deformation energy at each base pair step showed good cross-validated accuracy when classifying unseen sequences as binders or nonbinders. These results support DNA deformation energy as an indirect recognition mechanism across a wider range of DNA-binding proteins. Deformation energy may also have a predictive capacity for the underlying catalytic mechanism of DNA-binding enzymes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.