2018
DOI: 10.1186/s12859-018-2556-9
|View full text |Cite
|
Sign up to set email alerts
|

Subject level clustering using a negative binomial model for small transcriptomic studies

Abstract: BackgroundUnsupervised clustering represents one of the most widely applied methods in analysis of high-throughput ‘omics data. A variety of unsupervised model-based or parametric clustering methods and non-parametric clustering methods have been proposed for RNA-seq count data, most of which perform well for large samples, e.g. N ≥ 500. A common issue when analyzing limited samples of RNA-seq count data is that the data follows an over-dispersed distribution, and thus a Negative Binomial likelihood model is o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
10
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 8 publications
(10 citation statements)
references
References 44 publications
0
10
0
Order By: Relevance
“…Following the guidelines by Celeux and Govaert (1992) and Rose (1998), we set the annealing rate r = 0.9, similar to other methods (Si et al, 2013;Li et al, 2018). It is suggested that the entropy should be sufficiently large in early iterations, eventually shrinking to 0 in later iterations (Klein and Dubes, 1989;van Laarhoven and Aarts, 1987;Rose, 1998).…”
Section: E-mentioning
confidence: 99%
See 2 more Smart Citations
“…Following the guidelines by Celeux and Govaert (1992) and Rose (1998), we set the annealing rate r = 0.9, similar to other methods (Si et al, 2013;Li et al, 2018). It is suggested that the entropy should be sufficiently large in early iterations, eventually shrinking to 0 in later iterations (Klein and Dubes, 1989;van Laarhoven and Aarts, 1987;Rose, 1998).…”
Section: E-mentioning
confidence: 99%
“…Many studies have demonstrated the clinical utility of such molecular subtypes, where significant differences in patient prognosis and treatment response between subtypes have been observed (Perou et al, 2000;Chia et al, 2012;Mao et al, 2017). However, few methods are able to cluster samples whose genes (features) are measured via RNA-seq, a common platform for measuring gene expression based upon high-throughput sequencing (Mo et al, 2013;Li et al, 2018). Keywords and phrases: clustering, RNA-seq, confounders, batch Gene expression in RNA-seq studies is typically quantified in terms of genelevel read counts, defined by the number of sequencing reads mapping to protein coding regions in a particular gene following sequence alignment.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…However, few methods are able to cluster samples whose genes (features) are measured via RNA-seq, a common platform for measuring gene expression based upon high-throughput sequencing (Mo et al, 2013;Li et al, 2018).…”
mentioning
confidence: 99%
“…iCluster+ also performs feature selection by inducing sparsity via the lasso penalty, but ignores potential overdispersion, or extra-Poisson variation, in counts. NBMB (Li et al, 2018) accounts for overdispersion via the negative-binomial distribution, but cannot identify clusterdiscriminatory genes. In addition, neither method is able to adjust for confounding factors, such as batch effects or differences in sequencing depth.…”
mentioning
confidence: 99%