Progression through the middle phase of sporulation in Saccharomyces cerevisiae is promoted by the successful completion of recombination at the end of prophase I. Completion of meiotic recombination allows the activation of the sporulation-speci®c transcription factor Ndt80, which binds to a speci®c DNA sequence, the middle sporulation element (MSE), and activates~150 genes to enable progression through meiosis. Here, we isolate the DNA-binding domain of Ndt80 and determine its crystal structure both free and in complex with an MSE-containing DNA. The structure reveals that Ndt80 is a member of the Ig-fold family of transcription factors. The structure of the DNA-bound form, re®ned at 1.4 A Ê , reveals an unexpected mode of recognition of 5¢-pyrimidine± guanine-3¢ dinucleotide steps by arginine residues that simultaneously recognize the 3¢-guanine base through hydrogen bond interactions and the 5¢-pyrimidine through stacking/van der Waals interactions. Analysis of the DNA-binding af®nities of MSE mutants demonstrates the central importance of these interactions, and of the AT-rich portion of the MSE. Functional similarities between Ndt80 and the Caenorhabditis elegans p53 homolog suggest an evolutionary link between Ndt80 and the p53 family.
Transcription factors (TFs) bind DNA by recognizing specific sequence motifs, typically of length 6–12bp. A motif can occur many thousands of times in the human genome, but only a subset of those sites are actually bound. Here we present a machine learning framework leveraging existing convolutional neural network architectures and model interpretation techniques to identify and interpret sequence context features most important for predicting whether a particular motif instance will be bound. We apply our framework to predict binding at motifs for 38 TFs in a lymphoblastoid cell line, score the importance of context sequences at base-pair resolution, and characterize context features most predictive of binding. We find that the choice of training data heavily influences classification accuracy and the relative importance of features such as open chromatin. Overall, our framework enables novel insights into features predictive of TF binding and is likely to inform future deep learning applications to interpret non-coding genetic variants.
Transcription factors (TFs) bind DNA by recognizing highly specific DNA sequence motifs, typically of length 6-12bp. A TF motif can occur tens of thousands of times in the human genome, but only a small fraction of those sites are actually bound. Despite the availability of genome-wide TF binding maps for hundreds of TFs, predicting whether a given motif occurrence is bound and identifying the influential context features remain challenging. Here we present a machine learning framework leveraging existing convolutional neural network architectures and state of the art model interpretation techniques to identify, visualize, and interpret context features most important for determining binding activity for a particular TF. We apply our framework to predict binding at motifs for 38 TFs in a lymphoblastoid cell line and achieve superior classification performance compared to existing frameworks. We compute importance scores for context regions at single base pair resolution and uncover known and novel determinants of TF binding. Finally, we demonstrate that important context bases are under increased purifying selection compared to nearby bases and are enriched in disease-associated variants identified by genome-wide association studies.
Short tandem repeats (STRs) are a class of rapidly mutating genetic elements typically characterized by repeated units of 1–6bp. We leveraged whole genome sequencing data for 152 recombinant inbred (RI) strains from the BXD family of mice to map loci that modulate genome-wide patterns of new mutations arising during parent-to-offspring transmission at STRs. We defined quantitative phenotypes describing the numbers and types of germline STR mutations in each strain and performed quantitative trait locus (QTL) analyses for each of these phenotypes. We identified a locus on Chromosome 13 at which strains inheriting the C57BL/6J (B) haplotype have a higher rate of STR expansions than those inheriting the DBA/2J (D) haplotype. The strongest candidate gene in this locus isMsh3, a known modifier of STR stability in cancer and at pathogenic repeat expansions in mice and humans, and a current drug target against Huntington's disease. The D haplotype at this locus harbors a cluster of variants near the 5’ end ofMsh3including multiple missense variants near the DNA mismatch recognition domain. In contrast, the B haplotype contains a unique retrotransposon insertion. The rate of expansion covaries positively withMsh3expression—with higher expression from the B haplotype. Finally, detailed analysis of mutation patterns showed that strains carrying the B allele have higher expansion rates, but slightly lower overall total mutation rates, compared to those with the D allele, particularly at tetranucleotide repeats. Our results suggest an important role for inherited variants inMsh3in modulating genome-wide patterns of germline mutations at STRs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.