Data integration of single-cell RNA-seq (scRNA-seq) data describes the task of embedding datasets gathered from different sources or experiments into a common representation so that cells with similar types or states are embedded close to one another independently from their dataset of origin. Data integration is a crucial step in most scRNA-seq data analysis pipelines involving multiple batches. It improves data visualization, batch effect reduction, clustering, label transfer, and cell type inference. Many data integration tools have been proposed during the last decade, but a surge in the number of these methods has made it difficult to pick one for a given use case. Furthermore, these tools are provided as rigid pieces of software, making it hard to adapt them to various specific scenarios. In order to address both of these issues at once, we introduce the transmorph framework. It allows the user to engineer powerful data integration pipelines and is supported by a rich software ecosystem. We demonstrate transmorph usefulness by solving a variety of practical challenges on scRNA-seq datasets including joint datasets embedding, gene space integration, and transfer of cycle phase annotations. transmorph is provided as an open source python package.
Data integration of single-cell data describes the task of embedding datasets obtained from different sources into a common space, so that cells with similar cell type or state end up close from one another in this representation independently from their dataset of origin. Data integration is a crucial early step in most data analysis pipelines involving multiple batches and allows informative data visualization, batch effect reduction, high resolution clustering, accurate label transfer and cell type inference. Many tools have been proposed over the last decade to tackle data integration, and some of them are routinely used today within data analysis workflows. Despite constant endeavors to conduct exhaustive benchmarking studies, a recent surge in the number of these methods has made it difficult to choose one objectively for a given use case. Furthermore, these tools are generally provided as rigid pieces of software allowing little to no user agency on their internal parameters and algorithms, which makes it hard to adapt them to a variety of use cases. In an attempt to address both of these issues at once we introduce transmorph, an ambitious unifying framework for data integration. It allows building complex data integration pipelines by combining existing and original algorithmic modules, and is supported by a rich software ecosystem to easily benchmark modules, analyze and report results. We demonstrate transmorph capabilities and the value of its expressiveness by solving a variety of practical single-cell applications including supervised and unsupervised joint datasets embedding, RNA-seq integration in gene space and label transfer of cell cycle phase within cell cycle genes space. We provide transmorph as a free, open source and computationally efficient python library, with a particular effort to make it compatible with the other state-of-the-art tools and workflows.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.