2016
DOI: 10.1289/ehp172
|View full text |Cite
|
Sign up to set email alerts
|

A Systematic Comparison of Linear Regression–Based Statistical Methods to Assess Exposome-Health Associations

Abstract: Background:The exposome constitutes a promising framework to improve understanding of the effects of environmental exposures on health by explicitly considering multiple testing and avoiding selective reporting. However, exposome studies are challenged by the simultaneous consideration of many correlated exposures.Objectives:We compared the performances of linear regression–based statistical methods in assessing exposome-health associations.Methods:In a simulation study, we generated 237 exposure covariates wi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

4
156
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
8

Relationship

3
5

Authors

Journals

citations
Cited by 179 publications
(160 citation statements)
references
References 35 publications
4
156
0
Order By: Relevance
“…One such example is elastic net [13], which offers improved effect estimates of highly correlated variables over LASSO while preserving its variable selection capability by combining it with ridge regression. The performance of elastic net has been studied via simulations within contexts similar to those in air pollution epidemiology with moderate correlation between chemical mixtures and the relationship with term birth weight [14], and high correlation between environmental factors mimicking exposure in mothers during pregnancy [15], both under AME specification linear regression.…”
Section: Resultsmentioning
confidence: 99%
“…One such example is elastic net [13], which offers improved effect estimates of highly correlated variables over LASSO while preserving its variable selection capability by combining it with ridge regression. The performance of elastic net has been studied via simulations within contexts similar to those in air pollution epidemiology with moderate correlation between chemical mixtures and the relationship with term birth weight [14], and high correlation between environmental factors mimicking exposure in mothers during pregnancy [15], both under AME specification linear regression.…”
Section: Resultsmentioning
confidence: 99%
“…There are also more evaluations of linear regression than of other link functions, as they are generally computationally less demanding and have relatively high power. In a recent simulation based on a low-dimensional pregnancy exposome data set (p=237 exposures and q =0–25 truly associated) and linear modelling, Agier et al 7 found that deletion/substitution/addition36 and a stochastic search algorithm, Graphical Unit Evolutionary Stochastic Search,37 performed best; we did not include these methods because either a current R package was not available or the method was too computationally demanding for a large simulation study, and the latter method cannot be applied in a logistic regression framework. They also found that elastic net performed reasonably well.…”
Section: Discussionmentioning
confidence: 99%
“…Recently published simulation analyses have evaluated methods for the analysis of continuous outcomes considering data structures inspired by the pregnancy exposome7 and air pollution epidemiology 8. We extended this work by (1) studying a binary outcome, as much of epidemiological research deals with data from case–control studies and presence/absence of disease, (2) focusing on a larger set of simulation scenarios, and (3) by exploring in what way different characteristics of the simulated exposure matrix and the strength of the exposure–outcome association affect the performance of variable selection methods.…”
mentioning
confidence: 99%
“…A similar situation has been described in the context of the National Health and Nutrition Examination Survey [18]. In this context, the EWAS approach is not able to untangle the associations that are significant because the exposure is a causal exposure truly effecting the health outcome from the associations that are significant because the exposure is correlated to one of the causal exposures; this happens even if a multiple linear regression step including all potential "hits" is added after the EWAS step [19]. This can lead to a high rate of false positives.…”
Section: Questionnairesmentioning
confidence: 95%
“…However, systematic simulation studies are needed to characterise the efficiency of these approaches in the context of the exposome. A simulation study conducted in the frame of the HELIX and EXPOsOMICS projects showed that, due to the correlation within the exposome, the linear regression-based statistical methods that were investigated (one for each of the large families of variable selection approaches applied to regression-based methods) were only moderately efficient to differentiate true predictors from correlated covariates [19]. Further issues relate to measurement error and the identification of synergistic effects between exposures, for which it remains unclear which statistical model would perform best and which sample size would be required in exposome studies to provide sufficient statistical power.…”
Section: Questionnairesmentioning
confidence: 99%