Binding of transcription factors (TFs) at proximal promoters and distal enhancers is central to gene regulation. Yet, identification of TF binding sites, also known as regulatory motifs, and quantification of their impact remains challenging. Here we present scover, a convolutional neural network model that can discover putative regulatory motifs along with their cell type-specific importance from single-cell data. Analysis of scRNA-seq data from human kidney shows that ETS, YY1 and NRF1 are the most important motif families for proximal promoters. Using multiple mouse tissues we obtain for the first time a model with cell type resolution which explains 34% of the variance in gene expression. Finally, by applying scover to distal enhancers identified using scATAC-seq from the mouse cerebral cortex we highlight the emergence of layer specific regulatory patterns during development.
The binding of transcription factors at proximal promoters and distal enhancers is central to gene regulation. Identifying regulatory motifs and quantifying their impact on expression remains challenging. Using a convolutional neural network trained on single-cell data, we infer putative regulatory motifs and cell type-specific importance. Our model, scover, explains 29% of the variance in gene expression in multiple mouse tissues. Applying scover to distal enhancers identified using scATAC-seq from the developing human brain, we identify cell type-specific motif activities in distal enhancers. Scover can identify regulatory motifs and their importance from single-cell data where all parameters and outputs are easily interpretable.
The natural habitat of SARS-CoV-2 is the cytoplasm of a mammalian cell where it replicates its genome and expresses its proteins. While SARS-CoV-2 genes and hence its codons are presumably well optimized for mammalian protein translation, they have not been sequence optimized for nuclear expression. The cDNA of the Spike protein harbors over a hundred predicted splice sites and produces mostly aberrant mRNA transcripts when expressed in the nucleus. While different codon optimization strategies increase the proportion of full-length mRNA, they do not directly address the underlying splicing issue with commonly detected cryptic splicing events hindering the full expression potential. Similar splicing characteristics were also observed in other transgenes. By inserting multiple short introns throughout different transgenes, significant improvement in expression was achieved, including >7-fold increase for Spike transgene. Provision of a more natural genomic landscape offers a novel way to achieve multi-fold improvement in transgene expression.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.