Semi-deconvolution of bulk and single-cell RNA-seq data with application to metastatic progression in breast cancer

Lei, Hongwei; Guo, Xiaoyan A.; Tao, Yifeng; Ding, Kai; Fu, Xuecong; Oesterreich, Steffi; Lee, Adrian V.; Schwartz, Russell

doi:10.1093/bioinformatics/btac262

Cited by 1 publication

(1 citation statement)

References 38 publications

(37 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Complete algorithms return both and . Here we selected nine deconvolution algorithms: DeconRNASeq [ 12 ], lsfit [ 31 ], DWLS [ 14 ], NMF [ 3 ], two versions of deconf (original and fast) [ 32 , 33 ], bMIND [ 34 ], RADs [ 35 ], and Scaden [ 36 ]. Scaden was a supervised deep learning algorithm that required labels of sc-RNASeq data, hence was not applicable for our cancer datasets due to lack of cancer cell type annotations.…”

Section: Methodsmentioning

confidence: 99%

Assessing transcriptomic heterogeneity of single-cell RNASeq data by bulk-level gene expression data

Tiong,

Luzhbin,

Yeang

2024

BMC Bioinformatics

View full text Add to dashboard Cite

Background Single-cell RNA sequencing (sc-RNASeq) data illuminate transcriptomic heterogeneity but also possess a high level of noise, abundant missing entries and sometimes inadequate or no cell type annotations at all. Bulk-level gene expression data lack direct information of cell population composition but are more robust and complete and often better annotated. We propose a modeling framework to integrate bulk-level and single-cell RNASeq data to address the deficiencies and leverage the mutual strengths of each type of data and enable a more comprehensive inference of their transcriptomic heterogeneity. Contrary to the standard approaches of factorizing the bulk-level data with one algorithm and (for some methods) treating single-cell RNASeq data as references to decompose bulk-level data, we employed multiple deconvolution algorithms to factorize the bulk-level data, constructed the probabilistic graphical models of cell-level gene expressions from the decomposition outcomes, and compared the log-likelihood scores of these models in single-cell data. We term this framework backward deconvolution as inference operates from coarse-grained bulk-level data to fine-grained single-cell data. As the abundant missing entries in sc-RNASeq data have a significant effect on log-likelihood scores, we also developed a criterion for inclusion or exclusion of zero entries in log-likelihood score computation. Results We selected nine deconvolution algorithms and validated backward deconvolution in five datasets. In the in-silico mixtures of mouse sc-RNASeq data, the log-likelihood scores of the deconvolution algorithms were strongly anticorrelated with their errors of mixture coefficients and cell type specific gene expression signatures. In the true bulk-level mouse data, the sample mixture coefficients were unknown but the log-likelihood scores were strongly correlated with accuracy rates of inferred cell types. In the data of autism spectrum disorder (ASD) and normal controls, we found that ASD brains possessed higher fractions of astrocytes and lower fractions of NRGN-expressing neurons than normal controls. In datasets of breast cancer and low-grade gliomas (LGG), we compared the log-likelihood scores of three simple hypotheses about the gene expression patterns of the cell types underlying the tumor subtypes. The model that tumors of each subtype were dominated by one cell type persistently outperformed an alternative model that each cell type had elevated expression in one gene group and tumors were mixtures of those cell types. Superiority of the former model is also supported by comparing the real breast cancer sc-RNASeq clusters with those generated by simulated sc-RNASeq data. Conclusions The results indicate that backward deconvolution serves as a sensible model selection tool for deconvolution algorithms and facilitates discerning hypotheses about cell type compositions underlying heterogeneous specimens such as tumors.

show abstract