Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies

Li, Xiaohong; Cooper, Nigel G. F.; O’Toole, Timothy E.; Rouchka, Eric C.

doi:10.1186/s12864-020-6502-7

Cited by 32 publications

(31 citation statements)

References 72 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…All DE analyses were done with R software (version 3.5.3) and the edgeR package [24] (version 3.22.5). Trimmed-mean M values (TMM) normalization was performed to normalize the counts among the different samples [37][38][39][40]. As high dispersion of low counts interfered with some of the statistical approximations used in edgeR, genes with low counts were filtered out using the filterByExpr function as recommended in the user's guide.…”

Section: De Analysis Of the Collected Raw Count Datamentioning

confidence: 99%

High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis

et al. 2021

View full text Add to dashboard Cite

Background RNA sequencing (RNA-Seq) has been widely applied in oncology for monitoring transcriptome changes. However, the emerging problem that high variation of gene expression levels caused by tumor heterogeneity may affect the reproducibility of differential expression (DE) results has rarely been studied. Here, we investigated the reproducibility of DE results for any given number of biological replicates between 3 and 24 and explored why a great many differentially expressed genes (DEGs) were not reproducible. Results Our findings demonstrate that poor reproducibility of DE results exists not only for small sample sizes, but also for relatively large sample sizes. Quite a few of the DEGs detected are specific to the samples in use, rather than genuinely differentially expressed under different conditions. Poor reproducibility of DE results is mainly caused by high variation of gene expression levels for the same gene in different samples. Even though biological variation may account for much of the high variation of gene expression levels, the effect of outlier count data also needs to be treated seriously, as outlier data severely interfere with DE analysis. Conclusions High heterogeneity exists not only in tumor tissue samples of each cancer type studied, but also in normal samples. High heterogeneity leads to poor reproducibility of DEGs, undermining generalization of differential expression results. Therefore, it is necessary to use large sample sizes (at least 10 if possible) in RNA-Seq experimental designs to reduce the impact of biological variability and DE results should be interpreted cautiously unless soundly validated.

show abstract

Section: De Analysis Of the Collected Raw Count Datamentioning

confidence: 99%

High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis

et al. 2021

View full text Add to dashboard Cite

show abstract

“…However, most studies follow cohort analysis using standard statistical algorithms to determine DEGs, where various normalization methods followed by negative binomial distributions or Poisson are utilized to model the gene count data. Cutoff score based on P-value generated by statistical modeling is then applied along with expression change threshold [ 124 , 125 ]. This method of analysis has been successful in different ways, as they could identify biomarkers and prognostic markers and determine which genes are usually overexpressed or downregulated in certain cancer types [ 126 ].…”

Section: Discussionmentioning

confidence: 99%

The overexpression of DNA repair genes in invasive ductal and lobular breast carcinomas: Insights on individual variations and precision medicine

et al. 2021

View full text Add to dashboard Cite

In the era of precision medicine, analyzing the transcriptomic profile of patients is essential to tailor the appropriate therapy. In this study, we explored transcriptional differences between two invasive breast cancer subtypes; infiltrating ductal carcinoma (IDC) and lobular carcinoma (LC) using RNA-Seq data deposited in the TCGA-BRCA project. We revealed 3854 differentially expressed genes between normal ductal tissues and IDC. In addition, IDC to LC comparison resulted in 663 differentially expressed genes. We then focused on DNA repair genes because of their known effects on patients’ response to therapy and resistance. We here report that 36 DNA repair genes are overexpressed in a significant number of both IDC and LC patients’ samples. Despite the upregulation in a significant number of samples, we observed a noticeable variation in the expression levels of the repair genes across patients of the same cancer subtype. The same trend is valid for the expression of miRNAs, where remarkable variations between patients’ samples of the same cancer subtype are also observed. These individual variations could lie behind the differential response of patients to treatment. The future of cancer diagnostics and therapy will inevitably depend on high-throughput genomic and transcriptomic data analysis. However, we propose that performing analysis on individual patients rather than a big set of patients’ samples will be necessary to ensure that the best treatment is determined, and therapy resistance is reduced.

show abstract

“…Of course, many factors may promote cancer such as chemicals, radiation as well as genetic defects in reparation and replication molecular machinery. To gain inside into such a complex problem as a molecular approach of cancer together with a stillevolving protocol of RNA-seq treatment regarding normalization procedure or error rate (Li et al, 2020), a robust measure was The numbers in the table represent the proportion (%) of tumors of a given cancer type that showed the gene among the top-20 most connected proteins of the subnetwork of up-regulated genes. The pink color concerns up-regulated genes in at least 70% of tumor samples of each cancer type.…”

Section: Galaxy Pipelinementioning

confidence: 99%

Galaxy and MEAN Stack to Create a User-Friendly Workflow for the Rational Optimization of Cancer Chemotherapy

Pires

Silva

Weyssow³

et al. 2021

Front. Genet.

View full text Add to dashboard Cite

One aspect of personalized medicine is aiming at identifying specific targets for therapy considering the gene expression profile of each patient individually. The real-world implementation of this approach is better achieved by user-friendly bioinformatics systems for healthcare professionals. In this report, we present an online platform that endows users with an interface designed using MEAN stack supported by a Galaxy pipeline. This pipeline targets connection hubs in the subnetworks formed by the interactions between the proteins of genes that are up-regulated in tumors. This strategy has been proved to be suitable for the inhibition of tumor growth and metastasis in vitro. Therefore, Perl and Python scripts were enclosed in Galaxy for translating RNA-seq data into protein targets suitable for the chemotherapy of solid tumors. Consequently, we validated the process of target diagnosis by (i) reference to subnetwork entropy, (ii) the critical value of density probability of differential gene expression, and (iii) the inhibition of the most relevant targets according to TCGA and GDC data. Finally, the most relevant targets identified by the pipeline are stored in MongoDB and can be accessed through the aforementioned internet portal designed to be compatible with mobile or small devices through Angular libraries.

show abstract

Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies

Cited by 32 publications

References 72 publications

High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis

High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis

The overexpression of DNA repair genes in invasive ductal and lobular breast carcinomas: Insights on individual variations and precision medicine

Galaxy and MEAN Stack to Create a User-Friendly Workflow for the Rational Optimization of Cancer Chemotherapy

Contact Info

Product

Resources

About