2021
DOI: 10.3389/fgene.2021.632620
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of Methods for Feature Selection in Clustering of High-Dimensional RNA-Sequencing Data to Identify Cancer Subtypes

Abstract: Cancer subtype identification is important to facilitate cancer diagnosis and select effective treatments. Clustering of cancer patients based on high-dimensional RNA-sequencing data can be used to detect novel subtypes, but only a subset of the features (e.g., genes) contains information related to the cancer subtype. Therefore, it is reasonable to assume that the clustering should be based on a set of carefully selected features rather than all features. Several feature selection methods have been proposed, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 14 publications
(11 citation statements)
references
References 36 publications
0
8
0
Order By: Relevance
“…HVG selection seeks to identify a subset of genes more predictive of distinct cell types than randomly expressed genes. While it is a widely used pre-processing strategy, HVG selection can struggle to account for important but lowly expressed genes or genes present in only a small fraction of cells ( Källberg et al., 2021 ). Furthermore, evaluation of various HVG methods found that different techniques show poor overlap in HVGs suggested from the same datasets and that highly expressed genes were often incorrectly flagged as HVGs ( Yip et al., 2018 ).…”
Section: Introductionmentioning
confidence: 99%
“…HVG selection seeks to identify a subset of genes more predictive of distinct cell types than randomly expressed genes. While it is a widely used pre-processing strategy, HVG selection can struggle to account for important but lowly expressed genes or genes present in only a small fraction of cells ( Källberg et al., 2021 ). Furthermore, evaluation of various HVG methods found that different techniques show poor overlap in HVGs suggested from the same datasets and that highly expressed genes were often incorrectly flagged as HVGs ( Yip et al., 2018 ).…”
Section: Introductionmentioning
confidence: 99%
“…HVG selection seeks to identify a subset of genes more predictive of distinct cell types than randomly expressed genes. While it is a widely used pre-processing strategy, HVG selection can struggle to account for important but lowly expressed genes, or genes present in only a small fraction of cells (Källberg, Vidman, and Rydén 2021). Furthermore, evaluation of various HVG methods found that different techniques show poor overlap in HVG suggested from the same datasets, and that highly expressed genes were often incorrectly flagged as HVGs (Yip, Sham, and Wang 2018).…”
Section: Introductionmentioning
confidence: 99%
“…Given the vast feature space of -omics datasets, creating an accurate model through penalised regression is often not difficult, however finding the right features for further study to infer biological understanding is harder. In recent years, feature selection has become a popular method for novel biomarker discovery [ 35 , 36 , 37 , 38 , 39 , 40 ] and the application of the novel feature selection methods in this paper could accelerate the discovery of biomarkers in many fields.…”
Section: Discussionmentioning
confidence: 99%