Objectives: Epidemiologic studies often suffer from incomplete data, measurement error (or misclassification), and confounding. Each of these can cause bias and imprecision in estimates of exposureeoutcome relations. We describe and compare statistical approaches that aim to control all three sources of bias simultaneously.Study Design and Setting: We illustrate four statistical approaches that address all three sources of bias, namely, multiple imputation for missing data and measurement error, multiple imputation combined with regression calibration, full information maximum likelihood within a structural equation modeling framework, and a Bayesian model. In a simulation study, we assess the performance of the four approaches compared with more commonly used approaches that do not account for measurement error, missing values, or confounding.Results: The results demonstrate that the four approaches consistently outperform the alternative approaches on all performance metrics (bias, mean squared error, and confidence interval coverage). Even in simulated data of 100 subjects, these approaches perform well.Conclusion: There can be a large benefit of addressing measurement error, missing values, and confounding to improve the estimation of exposureeoutcome relations, even when the available sample size is relatively small.
Data mining and machine learning techniques such as classification and regression trees (CART) represent a promising alternative to conventional logistic regression for propensity score estimation. Whereas incomplete data preclude the fitting of a logistic regression on all subjects, CART is appealing in part because some implementations allow for incomplete records to be incorporated in the tree fitting and provide propensity score estimates for all subjects. Based on theoretical considerations, we argue that the automatic handling of missing data by CART may however not be appropriate. Using a series of simulation experiments, we examined the performance of different approaches to handling missing covariate data; (i) applying the CART algorithm directly to the (partially) incomplete data, (ii) complete case analysis, and (iii) multiple imputation. Performance was assessed in terms of bias in estimating exposureoutcome effects among the exposed, standard error, mean squared error and coverage. Applying the CART algorithm directly to incomplete data resulted in bias, even in scenarios where data were missing completely at random. Overall, multiple imputation followed by CART resulted in the best performance. Our study showed that automatic handling of missing data in CART can cause serious bias and does not outperform multiple imputation as a means to account for missing data.
Aims Multiple risk scores to predict ischaemic stroke (IS) in patients with atrial fibrillation (AF) have been developed. This study aims to systematically review these scores, their validations and updates, assess their methodological quality, and calculate pooled estimates of the predictive performance. Methods and results We searched PubMed and Web of Science for studies developing, validating, or updating risk scores for IS in AF patients. Methodological quality was assessed using the Prediction model Risk Of Bias ASsessment Tool (PROBAST). To assess discrimination, pooled c-statistics were calculated using random-effects meta-analysis. We identified 19 scores, which were validated and updated once or more in 70 and 40 studies, respectively, including 329 validations and 76 updates—nearly all on the CHA2DS2-VASc and CHADS2. Pooled c-statistics were calculated among 6 267 728 patients and 359 373 events of IS. For the CHA2DS2-VASc and CHADS2, pooled c-statistics were 0.644 [95% confidence interval (CI) 0.635–0.653] and 0.658 (0.644–0.672), respectively. Better discriminatory abilities were found in the newer risk scores, with the modified-CHADS2 demonstrating the best discrimination [c-statistic 0.715 (0.674–0.754)]. Updates were found for the CHA2DS2-VASc and CHADS2 only, showing improved discrimination. Calibration was reasonable but available for only 17 studies. The PROBAST indicated a risk of methodological bias in all studies. Conclusion Nineteen risk scores and 76 updates are available to predict IS in patients with AF. The guideline-endorsed CHA2DS2-VASc shows inferior discriminative abilities compared with newer scores. Additional external validations and data on calibration are required before considering the newer scores in clinical practice. Clinical trial registration ID CRD4202161247 (PROSPERO).
Objective: Article full texts are often inaccessible via the standard search engines of biomedical literature, such as PubMed and Embase, which are commonly used for systematic reviews. Excluding the full-text bodies from a literature search may result in a small or selective subset of articles being included in the review because of the limited information that is available in only title, abstract, and keywords. This article describes a comparison of search strategies based on a systematic literature review of all articles published in 5 topranked epidemiology journals between 2000 and 2017.Study Design and Setting: Based on a text-mining approach, we studied how nine different methodological topics were mentioned across text fields (title, abstract, keywords, and text body). The following methodological topics were studied: propensity score methods, inverse probability weighting, marginal structural modeling, multiple imputation, Kaplan-Meier estimation, number needed to treat, measurement error, randomized controlled trial, and latent class analysis.Results: In total, 31,641 Hypertext Markup Language (HTML) files were downloaded from the journals' websites. For all methodological topics and journals, at most 50% of articles with a mention of a topic in the text body also mentioned the topic in the title, abstract, or keywords. For several topics, a gradual decrease over calendar time was observed of reporting in the title, abstract, or keywords.Conclusion: Literature searches based on title, abstract, and keywords alone may not be sufficiently sensitive for studies of epidemiological research practice. This study also illustrates the potential value of full-text literature searches, provided there is accessibility of fulltext bodies for literature searches.
Purpose: In studies of effects of time-varying drug exposures, adequate adjustment for time-varying covariates is often necessary to properly control for confounding.However, the granularity of the available covariate data may not be sufficiently fine, for example when covariates are measured for participants only when their exposure levels change.Methods: To illustrate the impact of choices regarding the frequency of measuring time-varying covariates, we simulated data for a large target trial and for large observational studies, varying in covariate measurement design. Covariates were measured never, on a fixed-interval basis, or each time the exposure level switched. For the analysis, it was assumed that covariates remain constant in periods of no measurement. Cumulative survival probabilities for continuous exposure and non-exposure were estimated using inverse probability weighting to adjust for time-varying confounding, with special emphasis on the difference between 5-year event risks.Results: With monthly covariate measurements, estimates based on observational data coincided with trial-based estimates, with 5-year risk differences being zero.Without measurement of baseline or post-baseline covariates, this risk difference was estimated to be 49% based on the available observational data. With measurements on a fixed-interval basis only, 5-year risk differences deviated from the null, to 29% for 6-monthly measurements, and with magnitude increasing up to 35% as the interval length increased. Risk difference estimates diverged from the null to as low as À18% when covariates were measured depending on exposure level switching. Conclusion:Our simulations highlight the need for careful consideration of timevarying covariates in designing studies on time-varying exposures. We caution against implementing designs with long intervals between measurements. The maximum length required will depend on the rates at which treatments and covariates change, with higher rates requiring shorter measurement intervals.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.