Determining structures of protein complexes is crucial for understanding cellular functions. Here, we describe an integrative structure determination approach that relies on in vivo measurements of genetic interactions. We construct phenotypic profiles for point mutations crossed against gene deletions or exposed to environmental perturbations, followed by converting similarities between two profiles into an upper bound on the distance between the mutated residues. We determine the structure of the yeast histone H3-H4 complex based on ~500,000 genetic interactions of 350 mutants. We then apply the method to subunits Rpb1-Rpb2 of yeast RNA polymerase II and subunits RpoB-RpoC of bacterial RNA polymerase. The accuracy is comparable to that based on chemical cross-links; using restraints from both genetic interactions and cross-links further improves model accuracy and precision. The approach provides an efficient means to augment integrative structure determination with in vivo observations.
Myriad mechanisms diversify the sequence content of eukaryotic transcripts at the DNA and RNA level with profound functional consequences. Examples include diversity generated by RNA splicing and V(D)J recombination. Today, these and other events are detected with fragmented bioinformatic tools that require predefining a form of transcript diversification; moreover, they rely on alignment to a necessarily incomplete reference genome, filtering out unaligned sequences which can be among the most interesting. Each of these steps introduces blindspots for discovery. Here, we develop NOMAD+, a new analytic method that performs unified, reference-free statistical inference directly on raw sequencing reads, extending the core NOMAD algorithm to include a micro-assembly and interpretation framework. NOMAD+ discovers broad and new examples of transcript diversification in single cells, bypassing genome alignment and without requiring cell type metadata and impossible with current algorithms. In 10,326 primary human single cells in 19 tissues profiled with SmartSeq2, NOMAD+ discovers a set of splicing and histone regulators with highly conserved intronic regions that are themselves targets of complex splicing regulation and unreported transcript diversity in the heat shock protein HSP90AA1. NOMAD+ simultaneously discovers diversification in centromeric RNA expression, V(D)J recombination, RNA editing, and repeat expansions missed by or impossible to measure with existing bioinformatic methods. NOMAD+ is a unified, highly efficient algorithm enabling unbiased discovery of an unprecedented breadth of RNA regulation and diversification in single cells through a new paradigm to analyze the transcriptome.
We present a unifying statistical formulation for many fundamental problems in genome science and develop a reference-free, highly efficient algorithm that solves it. Sequence diversification - nucleic acid mutation, rearrangement, and reassortment - is necessary for the differentiation and adaptation of all replicating organisms. Identifying sample-dependent sequence diversification, e.g. adaptation or regulated isoform expression, is fundamental to many biological studies, and is achieved today with next-generation sequencing. Paradoxically, current analyses begin with attempts to align to or assemble necessarily incomplete reference genomes, a step that is at odds with detecting the most important examples of sequence diversification. In addition to being computationally expensive, reference-first approaches suffer from diminished discovery power: they are blind to unaligned or mis-aligned sequences. We provide a unifying formulation for detecting sample-dependent sequence diversification that subsumes core problems faced in diverse biological fields. This formulation allows us to construct an algorithm that performs inference on raw reads, avoiding references completely. We illustrate the power of our approach for new data-driven biological discovery with examples of novel single-cell resolved, cell-type-specific isoform expression, including expression in the major histocompatibility complex, and de novo prediction of viral protein adaptation including in SARS-CoV-2.
RNA processing, including splicing and alternative polyadenylation, is crucial to gene function and regulation, but methods to detect RNA processing from single-cell RNA sequencing data are limited by reliance on pre-existing annotations, peak calling heuristics, and collapsing measurements by cell type. We introduce ReadZS, an annotation-free statistical approach to identify regulated RNA processing in single cells. ReadZS discovers cell type-specific RNA processing in human lung and conserved, developmentally regulated RNA processing in mammalian spermatogenesis—including global 3′ UTR shortening in human spermatogenesis. ReadZS also discovers global 3′ UTR lengthening in Arabidopsis development, highlighting the usefulness of this method in under-annotated transcriptomes.
Post-transcriptional regulation of RNA processing (RNAP), including splicing and alternative polyadenylation (APA), controls eukaryotic gene function. Conservative estimates based on bulk tissue studies conclude that at least 50% of mammalian genes undergo APA. Single-cell RNA sequencing (scRNA-seq) could enable a near complete estimate of the extent, function, and regulation of these and other forms of RNA processing. Yet, statistical methods to detect regulated RNAP are limited in their detection power because they suffer from reliance on (a) incomplete annotations of 3' untranslated regions (3' UTRs), (b) peak calling heuristics, (c) analysis based on measurements collapsed over all cells in a cell type (pseudobulking), or (d) APA-specific detection. Here, we introduce ReadZS, a computationally-efficient, and annotation-free statistical approach to identify regulated RNAP, including but not limited to APA, in single cells. ReadZS rediscovers and substantially extends the scope of known cell type-specific RNAP in the human lung and during human spermatogenesis. The unique single-cell resolution and statistical properties of ReadZS enable discovery of new evolutionarily conserved, developmentally regulated RNAP and subpopulations of lung-resident macrophages, homogenous by gene expression alone.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.