Direct-Coupling Analysis is a group of methods to harvest information about coevolving residues in a protein family by learning a generative model in an exponential family from data. In protein families of realistic size, this learning can only be done approximately, and there is a trade-off between inference precision and computational speed. We here show that an earlier introduced l 2 -regularized pseudolikelihood maximization method called plmDCA can be modified as to be easily parallelizable, as well as inherently faster on a single processor, at negligible difference in accuracy. We test the new incarnation of the method on 148 protein families from the Protein Families database (PFAM), one of the largest tests of this class of algorithms to date.
DNA can determine where and when genes are expressed, but the full set of sequence determinants that control gene expression is unknown. Here, we measured the transcriptional activity of DNA sequences that represent an ~100 times larger sequence space than the human genome using massively parallel reporter assays (MPRAs). Machine learning models revealed that transcription factors (TFs) generally act in an additive manner with weak grammar and that most enhancers increase expression from a promoter by a mechanism that does not appear to involve specific TF–TF interactions. The enhancers themselves can be classified into three types: classical, closed chromatin and chromatin dependent. We also show that few TFs are strongly active in a cell, with most activities being similar between cell types. Individual TFs can have multiple gene regulatory activities, including chromatin opening and enhancing, promoting and determining transcription start site (TSS) activity, consistent with the view that the TF binding motif is the key atomic unit of gene expression.
Point mutations in cancer have been extensively studied but chromosomal gains and losses have been more challenging to interpret due to their unspecific nature. Here we examine high-resolution allelic imbalance (AI) landscape in 1699 colorectal cancers, 256 of which have been whole-genome sequenced (WGSed). The imbalances pinpoint 38 genes as plausible AI targets based on previous knowledge. Unbiased CRISPR-Cas9 knockout and activation screens identified in total 79 genes within AI peaks regulating cell growth. Genetic and functional data implicate loss of TP53 as a sufficient driver of AI. The WGS highlights an influence of copy number aberrations on the rate of detected somatic point mutations. Importantly, the data reveal several associations between AI target genes, suggesting a role for a network of lineage-determining transcription factors in colorectal tumorigenesis. Overall, the results unravel the contribution of AI in colorectal cancer and provide a plausible explanation why so few genes are commonly affected by point mutations in cancers.
DNA determines where and when genes are expressed, but the full set of sequence determinants that control gene expression is not known. To obtain a global and unbiased view of the relative importance of different sequence determinants in gene expression, we measured transcriptional activity of DNA sequences that are in aggregate ∼100 times longer than the human genome in three different cell types. We show that enhancers can be classified to three main types: classical enhancers1, closed chromatin enhancers and chromatin-dependent enhancers, which act via different mechanisms and differ in motif content. Transcription factors (TFs) act generally in an additive manner with weak grammar, with classical enhancers increasing expression from promoters by a mechanism that does not involve specific TF-TF interactions. Few TFs are strongly active in a cell, with most activities similar between cell types. Chromatin-dependent enhancers are enriched in forkhead motifs, whereas classical enhancers contain motifs for TFs with strong transactivator domains such as ETS and bZIP; these motifs are also found at transcription start site (TSS)-proximal positions. However, some TFs, such as NRF1 only activate transcription when placed close to the TSS, and others such as YY1 display positional preference with respect to the TSS. TFs can thus be classified into four non-exclusive subtypes based on their transcriptional activity: chromatin opening, enhancing, promoting and TSS determining factors – consistent with the view that the binding motif is the only atomic unit of gene expression.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.