2020
DOI: 10.1186/s12859-020-03892-w
|View full text |Cite
|
Sign up to set email alerts
|

The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data

Abstract: Background In genomics, we often assume that continuous data, such as gene expression, follow a specific kind of distribution. However we rarely stop to question the validity of this assumption, or consider how broadly applicable it may be to all genes that are in the transcriptome. Our study investigated the prevalence of a range of gene expression distributions in three different tumor types from the Cancer Genome Atlas (TCGA). Results Surprising… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

2
23
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 33 publications
(25 citation statements)
references
References 30 publications
2
23
0
Order By: Relevance
“…We experimentally verify this hypothesis by comparing the distribution of the number of peaks per gene (see Figure 3 for a graphical representation) of the profiles of different cell types for each histone modification with the expected distributions of gene expression counts derived from the literature [27,28]. Figure 3 highlights that, as happens in RNA-seq experiments, very low or no signal is registered for the large majority of genes.…”
Section: Histone Signal Distributionmentioning
confidence: 70%
See 1 more Smart Citation
“…We experimentally verify this hypothesis by comparing the distribution of the number of peaks per gene (see Figure 3 for a graphical representation) of the profiles of different cell types for each histone modification with the expected distributions of gene expression counts derived from the literature [27,28]. Figure 3 highlights that, as happens in RNA-seq experiments, very low or no signal is registered for the large majority of genes.…”
Section: Histone Signal Distributionmentioning
confidence: 70%
“…In this phase, we investigate the possibility that a (relatively) high signal intensity of a histone modification is registered only in a fraction of genes. This hypothesis arises from the observation that in gene expression profiles, most genes are either constantly expressed or not expressed at all [27,28]. Consequently, if whole-genome histone modification profiles follow this behaviour, classical differential expression analysis techniques could be borrowed for processing histone signals.…”
Section: Histone Signal Distributionmentioning
confidence: 99%
“…To model the heterogeneity of gene expression data under a statistical framework, it is vital that the distribution with the most appropriate fit for each gene's expression profile be used [ 8 ]. While some statistical methods have appealed to the use of mixture models [ 6 , 9 ] as an alternative to the widely used negative binomial distribution [ 10 , 11 ], they fail to investigate the range of different gene expression distributions that may be present in the scRNA-seq data as a first step.…”
Section: Introductionmentioning
confidence: 99%
“…To model the heterogeneity of gene expression data under a statistical framework, it is vital that the distribution with the most appropriate fit for each gene’s expression profile be used [8]. While some statistical methods have appealed to the use of mixture models [6, 9] as an alternative to the widely-used negative binomial distribution [10, 11], they fail to investigate the range of different gene expression distributions that may be present in the scRNA-seq data as a first step.…”
Section: Introductionmentioning
confidence: 99%