2020
DOI: 10.1038/s41598-020-64829-0
|View full text |Cite
|
Sign up to set email alerts
|

Variable selection for inferential models with relatively high-dimensional data: Between method heterogeneity and covariate stability as adjuncts to robust selection

Abstract: Variable selection in inferential modelling is problematic when the number of variables is large relative to the number of data points, especially when multicollinearity is present. A variety of techniques have been described to identify 'important' subsets of variables from within a large parameter space but these may produce different results which creates difficulties with inference and reproducibility. our aim was evaluate the extent to which variable selection would change depending on statistical approac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
20
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 23 publications
(21 citation statements)
references
References 30 publications
1
20
0
Order By: Relevance
“…Importantly, our results also confirm the recently highlighted issue that different analytic methods used on same data can yield different results 11 , both in terms of variables selected and coefficient estimates 9 . The simulated datasets used in this study, in which the true underlying relationships were known, were useful to illustrate such differences between methods.…”
Section: Discussionsupporting
confidence: 84%
See 3 more Smart Citations
“…Importantly, our results also confirm the recently highlighted issue that different analytic methods used on same data can yield different results 11 , both in terms of variables selected and coefficient estimates 9 . The simulated datasets used in this study, in which the true underlying relationships were known, were useful to illustrate such differences between methods.…”
Section: Discussionsupporting
confidence: 84%
“…Over recent years, methods have been proposed in the statistical literature to improve variable selection for inference in high dimensional data, including modifications to AIC/BIC 5 , and a variety of regularisation methods based on functions that penalise model coefficients to balance over-and under-fitting (the variance-bias trade off) [6][7][8] . It has been shown, however, that different methods of variable selection can result in considerable differences in covariates selected 9 and this poses difficult questions for the researcher about which method to choose, as well as presenting wider concerns around variability of results and therefore the reproducibility of science 10,11 .…”
Section: Model Selection For Inferential Models With High Dimensionalmentioning
confidence: 99%
See 2 more Smart Citations
“…As results are expected to be variable due to the high dimensionality and the comparably low number of available years, we additionally assessed the stability of suitable datasets using different variable selection approaches in a prediction context. We applied two different methods in this study, as a comparison of different approaches is suggested if modeling is performed with high dimensional data 89 . One approach, which is methodologically comparable to our correlation analysis, is the calculation of models with single stepwise forward selection based on Pearson’s correlation and 100 repeated, threefold cross-validation cf.…”
Section: Methodsmentioning
confidence: 99%