As the scale of single-cell genomics experiments grows into the millions, the computational requirements to process this data are beyond the reach of many. Herein we present Scarf, a modularly designed Python package that seamlessly interoperates with other single-cell toolkits and allows for memory-efficient single-cell analysis of millions of cells on a laptop or low-cost devices like single-board computers. We demonstrate Scarf’s memory and compute-time efficiency by applying it to the largest existing single-cell RNA-Seq and ATAC-Seq datasets. Scarf wraps memory-efficient implementations of a graph-based t-stochastic neighbour embedding and hierarchical clustering algorithm. Moreover, Scarf performs accurate reference-anchored mapping of datasets while maintaining memory efficiency. By implementing a subsampling algorithm, Scarf additionally has the capacity to generate representative sampling of cells from a given dataset wherein rare cell populations and lineage differentiation trajectories are conserved. Together, Scarf provides a framework wherein any researcher can perform advanced processing, subsampling, reanalysis, and integration of atlas-scale datasets on standard laptop computers. Scarf is available on Github: https://github.com/parashardhapola/scarf.
Single-cell transcriptomics facilitates innovative approaches to define and identify cell types within tissues and cell populations. An emerging interest in the cancer field is to assess the heterogeneity of transformed cells, including the identification of tumor-initiating cells based on similarities to their normal counterparts. However, such cell mapping is often confounded by the large effects on total gene expression programs introduced by strong perturbations such as an oncogenic event. Here, we present Nabo, a novel computational method that allows mapping of cells from one population to the most similar cells in a reference population, independently of confounding changes to gene expression programs initiated by perturbation. We validated this method on multiple datasets from different sources and platforms and show that Nabo achieves higher rates of accuracy than conventional classification methods. Nabo is available as an integrated toolkit for preprocessing, cell mapping, differential gene expression identification, and visualization of single-cell RNA-Seq data. For exploratory studies, Nabo includes methods to help evaluate the reliability of cell mapping results. We applied Nabo on droplet-based single-cell RNA-Seq data of healthy and oncogene-induced (MLL-ENL) hematopoietic progenitor cells (GMLPs) differentiating in vitro. Despite a substantial cellular heterogeneity resulting from differentiation of GMLPs and the large transcriptional effects induced by the fusion oncogene, Nabo could pinpoint the specific cell stage where differentiation arrest occurs, which included an immunophenotypic definition of the tumor-initiating population. Thus, Nabo allows for relevant comparison between target and control cells, without being confounded by differences in population heterogeneity.
The increasing capacity to perform large-scale single-cell genomic experiments continues to outpace the ability to efficiently handle growing datasets. Herein we present Scarf, a modularly designed Python package that seamlessly interoperates with other single-cell toolkits and allows for memory efficient single-cell analysis of millions of cells on a laptop or low-cost devices like single board computers. We demonstrate Scarf's memory and compute-time efficiency by applying it to the largest existing single-cell RNA-Seq and ATAC-Seq datasets. Scarf wraps memory efficient implementations of a graph-based t-stochastic neighbour embedding and hierarchical clustering algorithm. Moreover, Scarf performs accurate reference-anchored mapping of datasets while maintaining memory efficiency. By implementing a novel data downsampling algorithm, Scarf additionally has the capacity to generate representative sampling of cells from a given dataset wherein rare cell populations and lineage differentiation trajectories are conserved. Together, Scarf provides a framework wherein any researcher can perform advanced processing, downsampling, reanalysis and integration of atlas-scale datasets on standard laptop computers.
Knowledge of human fetal blood development and how it differs from adult is highly relevant for our understanding of congenital blood and immune disorders as well as childhood leukemia, the latter known to originate in utero. Blood production during development occurs in waves that overlap in time and space adding to heterogeneity, which necessitates single cell approaches. Here, a combined single cell immunophenotypic and transcriptional map of first trimester primitive blood development is presented. Using CITE-seq (Cellular Indexing of Transcriptomes and Epitopes by Sequencing) the molecular profile of established immunophenotypic gated progenitors was analyzed in the fetal liver (FL). Classical markers for hematopoietic stem cells (HSCs) such as CD90 and CD49F were largely preserved, whereas CD135 (FLT3) and CD123 (IL3R) had a ubiquitous expression pattern capturing heterogenous populations. Direct molecular comparison with an adult bone marrow (BM) data set revealed that HSC-like cells were less frequent in FL, whereas cells with a lympho-myeloid signature were more abundant. Furthermore, an erythro-myeloid primed multipotent progenitor cluster was identified, potentially representing a transient, FL-specific progenitor. Based on the projection performed, up- and downregulated genes between fetal and adult cells were analyzed. In general, cell cycle pathways, including MYC targets were shown to be upregulated in fetal cells, whereas gene sets involved in inflammation and human leukocyte antigen (HLA) complex were downregulated. Importantly, a fetal core molecular signature was identified that could discriminate certain types of infant and childhood leukemia from adult counterparts. Our detailed single cell map presented herein emphasizes molecular as well as immunophenotypic differences between fetal and adult primitive blood cells, of significance for future studies of pediatric leukemia and blood development in general.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.