2021
DOI: 10.1214/20-aoas1407
|View full text |Cite
|
Sign up to set email alerts
|

Model-based feature selection and clustering of RNA-seq data for unsupervised subtype discovery

Abstract: Clustering is a form of unsupervised learning that aims to uncover latent groups within data based on similarity across a set of features. A common application of this in biomedical research is in delineating novel cancer subtypes from patient gene expression data, given a set of informative genes. However, it is typically unknown a priori what genes may be informative in discriminating between clusters, and what the optimal number of clusters are. Few methods exist for performing unsupervised clustering of RN… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 85 publications
(93 reference statements)
0
4
0
Order By: Relevance
“…As in perhaps all studies, the choice of the candidate tuning parameters cannot be fully objective. The adopted strategy has been very common in published studies [ 6 , 21 ]. Such a choice of candidate values has led to satisfactory performance in our numerical studies.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…As in perhaps all studies, the choice of the candidate tuning parameters cannot be fully objective. The adopted strategy has been very common in published studies [ 6 , 21 ]. Such a choice of candidate values has led to satisfactory performance in our numerical studies.…”
Section: Methodsmentioning
confidence: 99%
“…Moreover, the selected cell type-specific genes may additionally assist in understanding biological cell types. Examples of such developments are FSCseq [ 21 ] and snbClust [ 22 ], which are two negative binomial (NB) mixture model-based methods and perform feature selection using the penalisation technique. However, these models do not consider dropouts and may be ineffective with highly sparse data.…”
Section: Introductionmentioning
confidence: 99%
“…Using all available features as input can negatively impact cluster accuracy and distinction [30]. Often, features are selected based on variabilityusing a measure of mean absolute deviation (MAD) across all samples [31][32][33]. Thus, the most variable features are used as inputs for unsupervised learning.…”
Section: Clustering and Feature Selectionmentioning
confidence: 99%
“…Multi-omics clustering methods used for subtyping have included basic k-means clustering, iCluster, similarity network fusion, and consensus clustering algorithms [33,34]. Such methods work well with high-dimensional data but are equally dependent upon the input data used for clustering.…”
Section: Clustering and Feature Selectionmentioning
confidence: 99%