2019
DOI: 10.1371/journal.pone.0219102
|View full text |Cite
|
Sign up to set email alerts
|

Cluster analysis on high dimensional RNA-seq data with applications to cancer research - An evaluation study

Abstract: BackgroundClustering of gene expression data is widely used to identify novel subtypes of cancer. Plenty of clustering approaches have been proposed, but there is a lack of knowledge regarding their relative merits and how data characteristics influence the performance. We evaluate how cluster analysis choices affect the performance by studying four publicly available human cancer data sets: breast, brain, kidney and stomach cancer. In particular, we focus on how the sample size, distribution of subtypes and s… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
16
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
2

Relationship

1
8

Authors

Journals

citations
Cited by 27 publications
(17 citation statements)
references
References 40 publications
1
16
0
Order By: Relevance
“…Thus, we generally observe a significant decrease in performance for larger K true , although FSCseq's performance is most robust to this effect. In addition, similarly sporadic performance of HC in RNA-seq was also observed previously (Vidman, Källberg and Rydén, 2019), suggesting unreliability of HC for clustering RNA-seq gene expression. Finally, although the MC methods performed similarly on average, the best performing method between these three transformations varied for each set of simulation conditions, reflecting the fact that the optimal transformation may not be known in advance.…”
Section: Numerical Examplessupporting
confidence: 77%
“…Thus, we generally observe a significant decrease in performance for larger K true , although FSCseq's performance is most robust to this effect. In addition, similarly sporadic performance of HC in RNA-seq was also observed previously (Vidman, Källberg and Rydén, 2019), suggesting unreliability of HC for clustering RNA-seq gene expression. Finally, although the MC methods performed similarly on average, the best performing method between these three transformations varied for each set of simulation conditions, reflecting the fact that the optimal transformation may not be known in advance.…”
Section: Numerical Examplessupporting
confidence: 77%
“…Second, additional anticoagulants (i.e., intravenous heparin or direct oral anticoagulants), used on some patients, might have contributed to the dynamic changes in fibrinogen and D-dimer levels observed in this study. Third, while there is no definite rule for the minimum sample size for clustering approaches [29] and the indicative variables appear to be distributed appropriately, our small sample size requires extra caution when interpreting our findings. Further studies are needed to confirm our findings.…”
Section: Discussionmentioning
confidence: 86%
“…Controls (C) cultured in the presence of fetal bovine serum (FBS) were also included. Unsupervised hierarchical clustering ( Vidman et al, 2019 ) indicated that the transcriptomes clustered well together according to the serum donors’ bonding history, except the virgin (V) group that exhibited the lowest discrimination ( Figure 3—figure supplement 2 ). Differential gene expression analysis was performed as described before by using the iDEP platform ( Ge et al, 2018 ).…”
Section: Resultsmentioning
confidence: 99%