High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis

Cui, Weitong; Xue, Huaru; Jin, Jinghua; Tian, Xuewen; Wang, Qinglu

doi:10.1186/s40246-021-00308-5

Cited by 25 publications

(29 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, as shown in Figure 1, when (FDR(0.05), LFC(1)) is applied, the number of DEGs detected positively correlates with n if n is below 10. Similar phenomena have been reported in previous studies using RNA‐seq read count data from mouse strains (Soneson & Delorenzi, 2013), yeast (Schurch et al., 2016), tomato plants (Lamarre et al., 2018), and human tissues (Cui et al., 2021), and cell lines (Liu et al., 2014). The results of the parallel analysis using HeLa cells (Supporting information Figures S7‐S9) are also consistent with those obtained from human tumor and normal tissue samples.…”

Section: Discussionsupporting

confidence: 86%

“…Poor reproducibility of DEGs has been shown in several studies using different datasets from tomato plants (Lamarre et al., 2018), yeast (Schurch et al., 2016), mouse strains (Soneson & Delorenzi, 2013), and human tissues(Cui et al., 2021), and cell lines (Liu et al., 2014); however, the findings of all these studies were based on DE analyses using the common threshold value for FC (│log2FC│≥ 1) and FDR (FDR < 0.05). The question was whether the reproducibility of DE results could be improved by increasing significance stringency.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Effect of high variation in transcript expression on identifying differentially expressed genes in RNA‐seq analysis

Cui

Xue

Geng

et al. 2021

Annals of Human Genetics

Self Cite

View full text Add to dashboard Cite

Summary Great efforts have been made on the algorithms that deal with RNA‐seq data to enhance the accuracy and efficiency of differential expression (DE) analysis. However, no consensus has been reached on the proper threshold values of fold change and adjusted p‐value for filtering differentially expressed genes (DEGs). It is generally believed that the more stringent the filtering threshold, the more reliable the result of a DE analysis. Nevertheless, by analyzing the impact of both adjusted p‐value and fold change thresholds on DE analyses, with RNA‐seq data obtained for three different cancer types from the Cancer Genome Atlas (TCGA) database, we found that, for a given sample size, the reproducibility of DE results became poorer when more stringent thresholds were applied. No matter which threshold level was applied, the overlap rates of DEGs were generally lower for small sample sizes than for large sample sizes. The raw read count analysis demonstrated that the transcript expression of the same gene in different samples, whether in tumor groups or in normal groups, showed high variations, which resulted in a drastic fluctuation in fold change values and adjustedp‐values when different sets of samples were used. Overall, more stringent thresholds did not yield more reliable DEGs due to high variations in transcript expression; the reliability of DEGs obtained with small sample sizes was more susceptible to these variations. Therefore, less stringent thresholds are recommended for screening DEGs. Moreover, large sample sizes should be considered in RNA‐seq experimental designs to reduce the interfering effect of variations in transcript expression on DEG identification.

show abstract

Section: Discussionsupporting

confidence: 86%

Section: Introductionmentioning

confidence: 99%

Effect of high variation in transcript expression on identifying differentially expressed genes in RNA‐seq analysis

Cui

Xue

Geng

et al. 2021

Annals of Human Genetics

Self Cite

View full text Add to dashboard Cite

show abstract

“…Some challenges for single source RNA-seq data that significantly affect the generalization of the analysis are the varying data acquisition protocols [ 71 ], intratumor heterogeneity [ 72 ] and local mutation burden [ 73 ]. These are prominent aspects of non-small cell lung carcinomas.…”

Section: Discussionmentioning

confidence: 99%

Deep Radiotranscriptomics of Non-Small Cell Lung Carcinoma for Assessing Molecular and Histology Subtypes with a Data-Driven Analysis

et al. 2021

View full text Add to dashboard Cite

Radiogenomic and radiotranscriptomic studies have the potential to pave the way for a holistic decision support system built on genomics, transcriptomics, radiomics, deep features and clinical parameters to assess treatment evaluation and care planning. The integration of invasive and routine imaging data into a common feature space has the potential to yield robust models for inferring the drivers of underlying biological mechanisms. In this non-small cell lung carcinoma study, a multi-omics representation comprised deep features and transcriptomics was evaluated to further explore the synergetic and complementary properties of these diverse multi-view data sources by utilizing data-driven machine learning models. The proposed deep radiotranscriptomic analysis is a feature-based fusion that significantly enhances sensitivity by up to 0.174 and AUC by up to 0.22, compared to the baseline single source models, across all experiments on the unseen testing set. Additionally, a radiomics-based fusion was also explored as an alternative methodology yielding radiomic signatures that are comparable to several previous publications in the field of radiogenomics. Furthermore, the machine learning multi-omics analysis based on deep features and transcriptomics achieved an AUC performance of up to 0.831 ± 0.09/0.925 ± 0.04 for the examined molecular and histology subtypes analysis, respectively. The clinical impact of such high-performing models can add prognostic value and lead to optimal treatment assessment by targeting specific oncogenes, namely the response of tyrosine kinase inhibitors of EGFR mutated or predicting the chemotherapy resistance of KRAS mutated tumors.

show abstract

“…High-throughput analyses of gene expression hold great promise for the identification of biomarkers of clinical status, with the potential of predicting outcome, response to therapy, or informing researchers about molecular mechanisms underpinning disease onset and progression and identifying therapeutic targets [1]. Nevertheless, lists of candidate genes obtained through transcriptome-based studies have proven difficult to reproduce [2][3][4][5][6], raising a note of caution regarding conclusions driven by single sets of experiments. Sample collection and processing methods, protocols, and platforms may impact on the resulting gene signatures, making them non-overlapping between studies [7].…”

Section: Introductionmentioning

confidence: 99%

Meta-Analysis of Microdissected Breast Tumors Reveals Genes Regulated in the Stroma but Hidden in Bulk Analysis

Savino

Marzo

Provero

et al. 2021

Cancers

View full text Add to dashboard Cite

Transcriptome data provide a valuable resource for the study of cancer molecular mechanisms, but technical biases, sample heterogeneity, and small sample sizes result in poorly reproducible lists of regulated genes. Additionally, the presence of multiple cellular components contributing to cancer development complicates the interpretation of bulk transcriptomic profiles. To address these issues, we collected 48 microarray datasets derived from laser capture microdissected stroma or epithelium in breast tumors and performed a meta-analysis identifying robust lists of differentially expressed genes. This was used to create a database with carefully harmonized metadata that we make freely available to the research community. As predicted, combining the results of multiple datasets improved statistical power. Moreover, the separate analysis of stroma and epithelium allowed the identification of genes with different contributions in each compartment, which would not be detected by bulk analysis due to their distinct regulation in the two compartments. Our method can be profitably used to help in the discovery of biomarkers and the identification of functionally relevant genes in both the stroma and the epithelium. This database was made to be readily accessible through a user-friendly web interface.

show abstract

High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis

Cited by 25 publications

References 43 publications

Effect of high variation in transcript expression on identifying differentially expressed genes in RNA‐seq analysis

Effect of high variation in transcript expression on identifying differentially expressed genes in RNA‐seq analysis

Deep Radiotranscriptomics of Non-Small Cell Lung Carcinoma for Assessing Molecular and Histology Subtypes with a Data-Driven Analysis

Meta-Analysis of Microdissected Breast Tumors Reveals Genes Regulated in the Stroma but Hidden in Bulk Analysis

Contact Info

Product

Resources

About