2016
DOI: 10.1016/j.artmed.2015.11.001
|View full text |Cite
|
Sign up to set email alerts
|

The feature selection bias problem in relation to high-dimensional gene data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
22
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 60 publications
(22 citation statements)
references
References 21 publications
0
22
0
Order By: Relevance
“…Selection bias has been reported when a given dataset is used for both variable selection and assessment of model performance, in turn leading to biased estimates and an increasing risk of false-positive discoveries due to overfitting (Ambroise and McLachlan, 2002; Castaldi et al , 2011; Cawley and Talbot, 2010; Krawczuk and Łukaszuk, 2016). Benefitting from the rdCV scheme, MUVR minimizes selection bias by performing variable selection and tuning of model parameters in the inner segments, followed by assessment of modelling performance using outer loop data held out of model construction and variable reduction.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Selection bias has been reported when a given dataset is used for both variable selection and assessment of model performance, in turn leading to biased estimates and an increasing risk of false-positive discoveries due to overfitting (Ambroise and McLachlan, 2002; Castaldi et al , 2011; Cawley and Talbot, 2010; Krawczuk and Łukaszuk, 2016). Benefitting from the rdCV scheme, MUVR minimizes selection bias by performing variable selection and tuning of model parameters in the inner segments, followed by assessment of modelling performance using outer loop data held out of model construction and variable reduction.…”
Section: Resultsmentioning
confidence: 99%
“…It is also noteworthy that many existing variable selection techniques may suffer from selection bias, consequently inducing underestimation of error rates and leading to general model overfitting (Krawczuk and Łukaszuk, 2016). Such selection bias occurs when variable selection is carried out based on some or all of the samples used to estimate the prediction error in cross-validation scheme, which is frequently applied to optimize model parameters and to evaluate model performance (e.g.…”
Section: Introductionmentioning
confidence: 99%
“…It enables to propose simple biomarkers and to smoothly distinguish between them in terms of their performance. Second, we work with the independent RT-qPCR data set that serves to validate the selected biomarkers, remove the selection bias and get an unbiased estimate of their classification accuracy (expressed in terms of AUC to compensate for unbalanced classes) [27,28].…”
Section: Methodsmentioning
confidence: 99%
“…By using a trainable multilayer artificial neural network (ANN) to replace hand-engineered features, deep learning takes the advantage of functional spectra, which are more robust and informative. In contrast, feature selection for high-dimensional data is a challenging task for conventional machine learning algorithms, which could lead to bias especially for high-throughput gene expression profiles 29 . To train a DeepCC classifier, we highly recommend employing a widely adopted molecular subtyping system, so that the deep features trained by the ANN can capture most relevant biological properties associated with each molecular subtype.…”
Section: Resultsmentioning
confidence: 99%