Xiaoyu Chen scite author profile

Sequence preferences of DNA-binding proteins are a primary mechanism by which cells interpret the genome. Despite these proteins' central importance in physiology, development, and evolution, comprehensive DNA-binding specificities have been determined experimentally for few proteins. Here, we used microarrays containing all 10-base-pair sequences to examine the binding specificities of 104 distinct mouse DNA-binding proteins representing 22 structural classes. Our results reveal a complex landscape of binding, with virtually every protein analyzed possessing unique preferences. Roughly half of the proteins each recognized multiple distinctly different sequence motifs, challenging our molecular understanding of how proteins interact with their DNA binding sites. This complexity in DNA recognition may be important in gene regulation and in evolution of transcriptional regulatory networks.The interactions between transcription factors (TFs) and their DNA binding sites are an integral part of the gene regulatory networks that control development, core cellular processes, and responses to environmental perturbations. However, only a handful of sequence-specific TFs have been characterized well enough to identify all the sequences that they can and, just as importantly, can not bind. Computational analysis of microarray readout of chromatin immunoprecipitation experiments (ChIP-chip) suggests extensive use of low affinity binding sites in yeast (1), and computational models of gene expression during fly embryonic development suggest that low affinity binding sites contribute as much as high affinity sites (2).The availability of TF binding data spanning the full affinity range would improve our understanding of the biophysical phenomena underlying protein-DNA recognition, and would improve accuracy in analyzing cis regulatory elements. Here we report the comprehensive determination of the DNA binding specificities of 104 known and predicted mouse TFs using the universal protein binding microarray (PBM) technology (3). These TFs represent 22 different DNA binding domain (DBD) structural classes that are the major DBD classes found in metazoan TFs.We created (4) N-terminal GST fusion constructs of the DBDs of 104 known and predicted mouse TFs (Fig. S1 and Table S1). Five of these proteins -Max, Bhlhb2, Gata3, Rfx3, and Sox7 -were also represented as full-length fusions to N-terminal GST, yielding a total set of 109 non-redundant proteins represented by 115 samples (5). Each protein was used in two PBM experiments (6,7) (Figs. S2, S3, S4 and Table S2). DNA binding site motifs initially were derived using the Seed-and-Wobble algorithm (3,8); Seed-and-Wobble first identifies the single 8-mer (ungapped or gapped) with the greatest PBM enrichment score (E-score) (3), and then systematically tests the relative preference of each nucleotide variant at each position both within and outside the seed (5). Later analyses incorporated additional motif finding algorithms, including RankMotif++ (9) and Kafal (5).Beyond simpl...

show abstract

Strelka2: fast and accurate calling of germline and somatic variants

Kim

et al. 2018

View full text Add to dashboard Cite

We describe Strelka2 ( https://github.com/Illumina/strelka ), an open-source small-variant-calling method for research and clinical germline and somatic sequencing applications. Strelka2 introduces a novel mixture-model-based estimation of insertion/deletion error parameters from each sample, an efficient tiered haplotype-modeling strategy, and a normal sample contamination model to improve liquid tumor analysis. For both germline and somatic calling, Strelka2 substantially outperformed the current leading tools in terms of both variant-calling accuracy and computing cost.

show abstract

Global mapping of protein-DNA interactions in vivo by digital genomic footprinting

et al. 2009

View full text Add to dashboard Cite

The orchestrated binding of transcriptional activators and repressors to specific DNA sequences in the context of chromatin defines the regulatory program of eukaryotic genomes. We developed a digital approach to assay regulatory protein occupancy on genomic DNA in vivo by dense mapping of individual DNase I cleavages from intact nuclei using massively parallel DNA sequencing. Analysis of > 23 million cleavages across the Saccharomyces cerevisiae genome revealed thousands of protected regulatory protein footprints, enabling de novo derivation of factor binding motifs as well as the identification of hundreds of novel binding sites for major regulators. We observed striking correspondence between nucleotide-level DNase I cleavage patterns and protein-DNA interactions determined by crystallography. The data also yielded a detailed view of larger chromatin features including positioned nucleosomes flanking factor binding regions. Digital genomic footprinting provides a powerful approach to delineate the cis-regulatory framework of any organism with an available genome sequence.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xiaoyu Chen

Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications

Diversity and Complexity in DNA Recognition by Transcription Factors

Strelka2: fast and accurate calling of germline and somatic variants

Global mapping of protein-DNA interactions in vivo by digital genomic footprinting

Contact Info

Product

Resources

About