The usual methods for analyzing case–cohort studies rely on sometimes not fully efficient weighted estimators. Multiple imputation might be a good alternative because it uses all the data available and approximates the maximum partial likelihood estimator. This method is based on the generation of several plausible complete data sets, taking into account uncertainty about missing values. When the imputation model is correctly defined, the multiple imputation estimator is asymptotically unbiased and its variance is correctly estimated. We show that a correct imputation model must be estimated from the fully observed data (cases and controls), using the case status among the explanatory variable. To validate the approach, we analyzed case–cohort studies first with completely simulated data and then with case–cohort data sampled from two real cohorts. The analyses of simulated data showed that, when the imputation model was correct, the multiple imputation estimator was unbiased and efficient. The observed gain in precision ranged from 8 to 37 per cent for phase‐1 variables and from 5 to 19 per cent for the phase‐2 variable. When the imputation model was misspecified, the multiple imputation estimator was still more efficient than the weighted estimators but it was also slightly biased. The analyses of case–cohort data sampled from complete cohorts showed that even when no strong predictor of the phase‐2 variable was available, the multiple imputation was unbiased, as precised as the weighted estimator for the phase‐2 variable and slightly more precise than the weighted estimators for the phase‐1 variables. However, the multiple imputation estimator was found to be biased when, because of interaction terms, some coefficients of the imputation model had to be estimated from small samples. Multiple imputation is an efficient technique for analyzing case–cohort data. Practically, we suggest building the analysis model using only the case–cohort data and weighted estimators. Multiple imputation can eventually be used to reanalyze the data using the selected model in order to improve the precision of the results. Copyright © 2011 John Wiley & Sons, Ltd.
Statistical analyses of longitudinal data with drop-outs based on direct likelihood, and using all the available data, provide unbiased and fully efficient estimates under some assumptions about the drop-out mechanism. Unfortunately, these assumptions can never be tested from the data. Thus, sensitivity analyses should be routinely performed to assess the robustness of inferences to departures from these assumptions. However, each specific scientific context requires different considerations when setting up such an analysis, no standard method exists and this is still an active area of research. We propose a flexible procedure to perform sensitivity analyses when dealing with continuous outcomes, which are described by a linear mixed model in an initial likelihood analysis. The methodology relies on the pattern-mixture model factorisation of the full data likelihood and was validated in a simulation study. The approach was prompted by a randomised clinical trial for sleep-maintenance insomnia treatment. This case study illustrated the practical value of our approach and underlined the need for sensitivity analyses when analysing data with drop-outs: some of the conclusions from the initial analysis were shown to be reliable, while others were found to be fragile and strongly dependent on modelling assumptions. R code for implementation is provided.
BackgroundThe use of structural equation modeling and latent variables remains uncommon in epidemiology despite its potential usefulness. The latter was illustrated by studying cross-sectional and longitudinal relationships between eating behavior and adiposity, using four different indicators of fat mass.MethodsUsing data from a longitudinal community-based study, we fitted structural equation models including two latent variables (respectively baseline adiposity and adiposity change after 2 years of follow-up), each being defined, by the four following anthropometric measurement (respectively by their changes): body mass index, waist circumference, skinfold thickness and percent body fat. Latent adiposity variables were hypothesized to depend on a cognitive restraint score, calculated from answers to an eating-behavior questionnaire (TFEQ-18), either cross-sectionally or longitudinally.ResultsWe found that high baseline adiposity was associated with a 2-year increase of the cognitive restraint score and no convincing relationship between baseline cognitive restraint and 2-year adiposity change could be established.ConclusionsThe latent variable modeling approach enabled presentation of synthetic results rather than separate regression models and detailed analysis of the causal effects of interest. In the general population, restrained eating appears to be an adaptive response of subjects prone to gaining weight more than as a risk factor for fat-mass increase.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.