A transcribed, multi-channel, and continuously evolving molecular recorder 77 To achieve our goal of a tunable, high information content molecular recorder, we 78 utilized Cas9 to generate insertions or deletions (indels) upon repair of double-stranded breaks, 79 which are inherited in the next generation of cells 11-16. We record within a 205 base pair, synthetic DNA "target site" containing three "cut sites" and a static 8 base pair "integration barcode" (intBC), which are delivered in multiple copies via piggyBac transposition (Fig. 1a, b). We embedded this sequence into the 3'UTR of a constitutively transcribed fluorescent protein to enable profiling from the transcriptome. A second cassette encodes three independently transcribed and complementary guide RNAs to permit recording of multiple, distinct signals (Fig. 1a, b) 18. Our system is capable of high information storage due to the diversity of heritable repair outcomes, and the large number of targeted sites, which can be distinguished by the intBC (Fig. 1c). DNA repair generates hundreds of unique indels, and the distribution for each cut site is different and nonuniform: some produce highly biased outcomes while others create a diverse series (Fig. 1c, Extended Data Fig. 1) 19-21. To identify sequences that can tune the mutation rate of our recorder for timescales that are not pre-defined, and may extend from days to months, we screened several guide RNA series containing mismatches to their targets 22 by monitoring their activity on a GFP reporter over a 20-day timecourse and selected those that demonstrated a broad dynamic range (Fig. 1d). Slower cutting rates may improve viability in vivo, as frequent Cas9mediated double-strand breaks can cause cellular toxicity 23,24. To demonstrate information recovery from single cell transcriptomes, we stably transduced K562 cells with our technology and generated a primary, cell-barcoded cDNA pool via the 10x Genomics platform, allowing us * * *
The pairing of CRISPR/Cas9-based gene editing with massively parallel single-cell readouts now enables large-scale lineage tracing. However, the rapid growth in complexity of data from these assays has outpaced our ability to accurately infer phylogenetic relationships. First, we introduce Cassiopeia-a suite of scalable maximum parsimony approaches for tree reconstruction. Second, we provide a simulation framework for evaluating algorithms and exploring lineage tracer design principles. Finally, we generate the most complex experimental lineage tracing dataset to date, 34,557 human cells continuously traced over 15 generations, and use it for benchmarking phylogenetic inference approaches. We show that Cassiopeia outperforms traditional methods by several metrics and under a wide variety of parameter regimes, and provide insight into the principles for the design of improved Cas9-enabled recorders. Together, these should broadly enable large-scale mammalian lineage tracing efforts. Cassiopeia and its benchmarking resources are publicly available at www.github.com/YosefLab/Cassiopeia.
Benchmarked approaches for reconstruction of in vitro cell lineages and in silico models of C. elegans and M. musculus developmental trees Graphical abstract Highlights d We organized a DREAM challenge to benchmark methods of cell lineage reconstruction d Using experimental, in silico datasets as ground-truth trees of 10 2 , 10 3 , and 10 4 cells d Smaller trees allowed the training of a machine-learning decision tree approach d These results delineate a potential way forward for solving larger cell lineage trees
Gene regulatory elements play a key role in orchestrating gene expression during cellular differentiation, but what determines their function over time remains largely unknown. Here, we perform perturbation-based massively parallel reporter assays at seven early time points of neural differentiation to systematically characterize how regulatory elements and motifs within them guide cellular differentiation. By perturbing over 2,000 putative DNA binding motifs in active regulatory regions, we delineate four categories of functional elements, and observe that activity direction is mostly determined by the sequence itself, while the magnitude of effect depends on the cellular environment. We also find that fine-tuning transcription rates is often achieved by a combined activity of adjacent activating and repressing elements. Our work provides a blueprint for the sequence components needed to induce different transcriptional patterns in general and specifically during neural differentiation.
Background Network connectivity problems are abundant in computational biology research, where graphs are used to represent a range of phenomena: from physical interactions between molecules to more abstract relationships such as gene co-expression. One common challenge in studying biological networks is the need to extract meaningful, small subgraphs out of large databases of potential interactions. A useful abstraction for this task turned out to be the Steiner Network problems: given a reference “database” graph, find a parsimonious subgraph that satisfies a given set of connectivity demands. While this formulation proved useful in a number of instances, the next challenge is to account for the fact that the reference graph may not be static. This can happen for instance, when studying protein measurements in single cells or at different time points, whereby different subsets of conditions can have different protein milieu. Results and discussion We introduce the condition Steiner Network problem in which we concomitantly consider a set of distinct biological conditions. Each condition is associated with a set of connectivity demands, as well as a set of edges that are assumed to be present in that condition. The goal of this problem is to find a minimal subgraph that satisfies all the demands through paths that are present in the respective condition. We show that introducing multiple conditions as an additional factor makes this problem much harder to approximate. Specifically, we prove that for C conditions, this new problem is NP-hard to approximate to a factor of , for every and , and that this bound is tight. Moving beyond the worst case, we explore a special set of instances where the reference graph grows monotonically between conditions, and show that this problem admits substantially improved approximation algorithms. We also developed an integer linear programming solver for the general problem and demonstrate its ability to reach optimality with instances from the human protein interaction network. Conclusion Our results demonstrate that in contrast to most connectivity problems studied in computational biology, accounting for multiplicity of biological conditions adds considerable complexity, which we propose to address with a new solver. Importantly, our results extend to several network connectivity problems that are commonly used in computational biology, such as Prize-Collecting Steiner Tree, and provide insight into the theoretical guarantees for their applications in a multiple condition setting.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.