Integration of single-cell RNA sequencing data between different samples has been a major challenge for analyzing cell populations. However, strategies to integrate differential expression analysis of single-cell data remain underinvestigated. Here, we benchmark 46 workflows for differential expression analysis of single-cell data with multiple batches. We show that batch effects, sequencing depth and data sparsity substantially impact their performances. Notably, we find that the use of batch-corrected data rarely improves the analysis for sparse data, whereas batch covariate modeling improves the analysis for substantial batch effects. We show that for low depth data, single-cell techniques based on zero-inflation model deteriorate the performance, whereas the analysis of uncorrected data using limmatrend, Wilcoxon test and fixed effects model performs well. We suggest several high-performance methods under different conditions based on various simulation and real data analyses. Additionally, we demonstrate that differential expression analysis for a specific cell type outperforms that of large-scale bulk sample data in prioritizing disease-related genes.
Although the molecular rules governing genome organization are being quickly elucidated, relatively few proteins regulating this process have been identified. To address this gap, we developed a fully automated imaging pipeline, called HiDRO (high-throughput DNA or RNA labeling with optimized Oligopaints), that permits quantitative measurement of chromatin interactions across a large number of samples. Using HiDRO, we screened the human druggable genome and identified >300 factors that regulate chromatin folding during interphase, including 43 validated hits that either increase or decrease interactions between topological associating domains (TADs). We discovered that genetic or chemical inhibition of the ubiquitous kinase GSK3A enhances long-range interactions by dysregulating cohesin-mediated chromatin looping. Collectively, these results highlight a noncanonical role for GSK3A signaling in nuclear architecture and underscore the broader utility of HiDRO-based screening to identify novel mechanisms that drive the spatial organization of the genome.
Integration of single-cell RNA sequencing (scRNA-seq) data between different samples has been a major challenge for analyzing cell populations. However, strategies to integrate differential expression (DE) analysis of scRNA-seq data remain underinvestigated. Here, we benchmarked 41 methods for integrative DE analysis of scRNA-seq data. Batch-effects, sparsity of data, and heterogeneity of samples substantially impacted the performance of DE analysis. Several methods that yielded high performances were suggested based on various simulations and real data analyses. In particular, the bulk RNA-seq tool edgeR incorporating the observation weights and the scRNA-seq tool MAST showed overall good performances. Remarkably, analysis for a specific cell type outperformed that of large-scale bulk sample data in prioritizing disease-related genes and pathways.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.