2020
DOI: 10.31234/osf.io/wc45u
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Lasso and Group Lasso with Categorical Predictors: Impact of Coding Strategy on Variable Selection and Prediction

Abstract: Machine learning methods are being increasingly adopted in psychological research. Lasso performs variable selection and regularization, and is particularly appealing to psychology researchers because of its connection to linear regression. Researchers conflate properties of linear regression with properties of lasso; however, we demonstrate that this is not the case for models with categorical predictors. Specifically, the coding strategy used for categorical predictors impacts lasso’s performance but not lin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 14 publications
0
4
0
Order By: Relevance
“…Furthermore, multilevel lasso estimation requires all variables in the model to have a mean of zero and a standard deviation of one. As this has also to apply for categorical variables and time, parameter interpretation may be less intuitive compared to standard REML-estimation (Huang & Montoya, 2020). Hence, standardized fixed effects are better extracted from a standard REML-estimation in which categorical variables and time can be coded as desired to improve interpretation (as done in our exploratory analyses below).…”
Section: Methodsmentioning
confidence: 99%
“…Furthermore, multilevel lasso estimation requires all variables in the model to have a mean of zero and a standard deviation of one. As this has also to apply for categorical variables and time, parameter interpretation may be less intuitive compared to standard REML-estimation (Huang & Montoya, 2020). Hence, standardized fixed effects are better extracted from a standard REML-estimation in which categorical variables and time can be coded as desired to improve interpretation (as done in our exploratory analyses below).…”
Section: Methodsmentioning
confidence: 99%
“…For this tutorial, we used coordinate descent, the estimation procedure of the glmnet package in R (Friedman et al, 2020). Before describing the steps of creating a LASSO model, we highlight a debate about how to best represent the categories of a categorical predictor variable with more than two categories (Huang & Montoya, 2020). One option is to overparameterize the model so that a reference group is not entered into the model.…”
Section: Lasso Framework and Loss Functionmentioning
confidence: 99%
“…Furthermore, multilevel lasso estimation requires all variables in the model to have a mean of zero and a standard deviation of one. As this has also to apply for categorical variables and time, parameter interpretation may be less intuitive compared to standard REML-estimation (Huang & Montoya, 2020).…”
Section: Multilevel Lasso Estimationmentioning
confidence: 99%