Single-cell RNA-seq (scRNAseq) is a powerful tool to study heterogeneity of cells. Recently, several clustering based methods have been proposed to identify distinct cell populations. These methods are based on different statistical models and usually require to perform several additional steps, such as preprocessing or dimension reduction, before applying the clustering algorithm. Individual steps are often controlled by methodspecific parameters, permitting the method to be used in different modes on the same datasets, depending on the user choices. The large number of possibilities that these methods provide can intimidate non-expert users, since the available choices are not always clearly documented. In addition, to date, no large studies have invistigated the role and the impact that these choices can have in different experimental contexts. This work aims to provide new insights into the advantages and drawbacks of scRNAseq clustering methods and describe the ranges of possibilities that are offered to users. In particular, we provide an extensive evaluation of several methods with respect to different modes of usage and parameter settings by applying them to real and simulated datasets that vary in terms of dimensionality, number of cell populations or levels of noise. Remarkably, the results presented here show that great variability in the performance of the models is strongly attributed to the choice of the user-specific parameter settings. We describe several tendencies in the performance attributed to their modes of usage and different types of datasets, and identify which methods are strongly affected by data dimensionality in terms of computational time. Finally, we highlight some open challenges in scRNAseq data clustering, such as those related to the identification of the number of clusters.
Significant innovations in next-generation sequencing techniques and bioinformatics tools have impacted our appreciation and understanding of RNA. Practical RNA sequencing (RNA-Seq) applications have evolved in conjunction with sequence technology and bioinformatic tools advances. In most projects, bulk RNA-Seq data is used to measure gene expression patterns, isoform expression, alternative splicing and single-nucleotide polymorphisms. However, RNA-Seq holds far more hidden biological information including details of copy number alteration, microbial contamination, transposable elements, cell type (deconvolution) and the presence of neoantigens. Recent novel and advanced bioinformatic algorithms developed the capacity to retrieve this information from bulk RNA-Seq data, thus broadening its scope. The focus of this review is to comprehend the emerging bulk RNA-Seq-based analyses, emphasizing less familiar and underused applications. In doing so, we highlight the power of bulk RNA-Seq in providing biological insights.
Bi-allelic hypomorphic mutations inDNMT3Bdisrupt DNA methyltransferase activity and lead to immunodeficiency, centromeric instability, facial anomalies syndrome, type 1 (ICF1). Although several ICF1 phenotypes have been linked to abnormally hypomethylated repetitive regions, the unique genomic regions responsible for the remaining disease phenotypes remain largely uncharacterized. Here we explored two ICF1 patient–derived induced pluripotent stem cells (iPSCs) and their CRISPR-Cas9-corrected clones to determine whetherDNMT3Bcorrection can globally overcome DNA methylation defects and related changes in the epigenome. Hypomethylated regions throughout the genome are highly comparable between ICF1 iPSCs carrying differentDNMT3Bvariants, and significantly overlap with those in ICF1 patient peripheral blood and lymphoblastoid cell lines. These regions include large CpG island domains, as well as promoters and enhancers of several lineage-specific genes, in particular immune-related, suggesting that they are premarked during early development. CRISPR-corrected ICF1 iPSCs reveal that the majority of phenotype-related hypomethylated regions reacquire normal DNA methylation levels following editing. However, at the most severely hypomethylated regions in ICF1 iPSCs, which also display the highest increases in H3K4me3 levels and/or abnormal CTCF binding, the epigenetic memory persists, and hypomethylation remains uncorrected. Overall, we demonstrate that restoring the catalytic activity of DNMT3B can reverse the majority of the aberrant ICF1 epigenome. However, a small fraction of the genome is resilient to this rescue, highlighting the challenge of reverting disease states that are due to genome-wide epigenetic perturbations. Uncovering the basis for the persistent epigenetic memory will promote the development of strategies to overcome this obstacle.
BackgroundBi-allelic hypomorphic mutations in DNMT3B disrupt DNA methyltransferase activity and lead to Immunodeficiency, Centromeric instability, Facial anomalies syndrome, type 1 (ICF1). While several ICF1 phenotypes have been linked to abnormally hypomethylated repetitive regions, the unique genomic regions responsible for the remaining disease phenotypes remain largely uncharacterized. Here we explored two ICF1 patient-induced pluripotent stem cells (iPSCs) and their CRISPR/Cas9 corrected clones to determine whether gene correction can overcome DNA methylation defects and related/associated changes in the epigenome of non-repetitive regions.ResultsHypomethylated regions throughout the genome are highly comparable between ICF1 iPSCs carrying different DNMT3B variants, and significantly overlap with those in ICF1-peripheral blood and lymphoblastoid cell lines. These regions include large CpG island domains, as well as promoters and enhancers of several lineage-specific genes, in particular immune-related, suggesting that they are pre- marked during early development. The gene corrected ICF1 iPSCs reveal that the majority of phenotype- related hypomethylated regions re-acquire normal DNA methylation levels following editing. However, at the most severely hypomethylated regions in ICF1 iPSCs, which also display the highest increased H3K4me3 levels and enrichment of CTCF-binding motifs, the epigenetic memory persisted, and hypomethylation was uncorrected.ConclusionsRestoring the catalytic activity of DNMT3B rescues the majority of the aberrant ICF1 epigenome. However, a small fraction of the genome is resilient to this reversal, highlighting the challenge of reverting disease states that are due to genome-wide epigenetic perturbations. Uncovering the basis for the persistent epigenetic memory will promote the development of strategies to overcome this obstacle.
Background Inflammatory bowel disease (IBD) is a complex disease characterised by chronic inflammation of the digestive tract. Genome-wide association studies (GWAS) have identified 241 risk loci significantly associated with the two common forms of IBD, Crohn’s disease and ulcerative colitis. The vast majority of these risk loci reside in non-coding regions of the genome, and we only know which gene is dysregulated to increase risk of disease for a minority. This knowledge gap makes it difficult to draw insights into disease pathology and identify new candidate drug targets. Methods To improve biological insights from IBD GWAS, we generated single cell RNA-sequencing data from ileal biopsies ascertained from 25 CD patients with active ileal inflammation and 26 non-IBD controls. We identified 49 different cell types among the ~140K sequenced cells (Fig.1), including all major immune, enterocyte, secretory and mesenchymal populations. Our optimized single-cell dissociation protocol preserves the top of villus epithelial cells, which are inherently prone to anoikis, enabling generation of high-quality transcriptomes for the first time. Fig. 1 Single-cell atlas of terminal ileum biopsies from Crohn’s disease and non-IBD individuals. Results We identified 797 unique genes differentially expressed between CD patients and controls, with notable expression differences in stem cell, secretory, and enterocyte populations. Genes involved in antigen presentation and interferon-gamma signaling were enriched among those most frequently dysregulated cell types. In an attempt to identify which of these expression differences are likely causal of disease, rather than simply a consequence of it, we integrated results from the latest IBD GWAS to assess the extent to which genes captured disease heritability, and in which cell-types. Genes specifically expressed in Tregs, monocytes and IL10RA-negative monocyte-derived macrophages captured a significant fraction of disease heritability, strongly implicating these cell types in disease pathogenesis. We investigated which genes were driving these enrichment signals and identified candidate effector genes at many IBD risk loci. Reassuringly, many confirmed IBD effector genes known to have a role in the normal functioning of these cell types were found, including NOD2, IL18RAP, IL23R, NCF4, and IL2RA. Conclusion Single-cell analysis combined with IBD genetics has generated strong evidence for a causal role of novel disease mechanisms that have therapeutic potential. Further experiments are underway to validate this finding. At ECCO we will present an updated version of this analysis, including genetic mapping of gene-regulation across cell types to identify IBD effector genes and causal variants.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.