Massively parallel reporter assays (MPRAs) have enabled the study of transcriptional regulatory mechanisms at an unprecedented scale and with high quantitative resolution. However, this realm lacks models that can discover sequence-specific signals de novo from the data and integrate them in a mechanistic way. We present MuSeAM (Multinomial CNNs for Sequence Activity Modeling), a convolutional neural network that overcomes this gap. MuSeAM utilizes multinomial convolutions that directly model sequence-specific motifs of protein-DNA binding. We demonstrate that MuSeAM fits MPRA data with high accuracy and generalizes over other tasks such as predicting chromatin accessibility and prioritizing potentially functional variants.
Intercellular communication and spatial organization of cells are two critical aspects of a tissue’s function. Understanding these aspects requires integrating data from single-cell RNA-Seq (scRNA-seq) and spatial transcriptomics (ST), the two cutting edge technologies that offer complementary insights into tissue composition, architecture, and function. Integrating these data types is non-trivial since they differ widely in the number of profiled genes and often do not share marker genes for given cell-types. We developed STANN, a neural network model that overcomes these methodological challenges. Given ST and scRNA-seq data of a tissue, STANN models cell-types in the scRNA-seq dataset from the genes that are profiled by both ST and scRNA-seq. The trained STANN model then assigns cell-types to the ST dataset. We apply STANN to assign cell-types in a recent ST dataset (SeqFISH+) of mouse olfactory bulb (MOB). Our analysis of STANN’s assigned cell-types revealed principles of tissue architecture and intercellular communication at unprecedented detail. We find that cell-type compositions are disproportionate in the tissue, yet their relative proportions are spatially consistent within individual morphological layers. Surprisingly, within a morphological layer, there is a high spatial variation in cell-type colocalization patterns and intercellular communication mechanisms. Our analysis suggests that spatially localized gene regulatory networks may account for such variability in intercellular communication mechanisms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.