Persons living with HIV engage in routine clinical care, generating large amounts of data in observational HIV cohorts. These data are often error‐prone, and directly using them in biomedical research could bias estimation and give misleading results. A cost‐effective solution is the two‐phase design, under which the error‐prone variables are observed for all patients during Phase I, and that information is used to select patients for data auditing during Phase II. For example, the Caribbean, Central, and South America network for HIV epidemiology (CCASAnet) selected a random sample from each site for data auditing. Herein, we consider efficient odds ratio estimation with partially audited, error‐prone data. We propose a semiparametric approach that uses all information from both phases and accommodates a number of error mechanisms. We allow both the outcome and covariates to be error‐prone and these errors to be correlated, and selection of the Phase II sample can depend on Phase I data in an arbitrary manner. We devise a computationally efficient, numerically stable EM algorithm to obtain estimators that are consistent, asymptotically normal, and asymptotically efficient. We demonstrate the advantages of the proposed methods over existing ones through extensive simulations. Finally, we provide applications to the CCASAnet cohort.
Objective To develop and validate algorithms for predicting 30-day fatal and nonfatal opioid-related overdose using statewide data sources including prescription drug monitoring program data, Hospital Discharge Data System data, and Tennessee (TN) vital records. Current overdose prevention efforts in TN rely on descriptive and retrospective analyses without prognostication. Materials and Methods Study data included 3 041 668 TN patients with 71 479 191 controlled substance prescriptions from 2012 to 2017. Statewide data and socioeconomic indicators were used to train, ensemble, and calibrate 10 nonparametric “weak learner” models. Validation was performed using area under the receiver operating curve (AUROC), area under the precision recall curve, risk concentration, and Spiegelhalter z-test statistic. Results Within 30 days, 2574 fatal overdoses occurred after 4912 prescriptions (0.0069%) and 8455 nonfatal overdoses occurred after 19 460 prescriptions (0.027%). Discrimination and calibration improved after ensembling (AUROC: 0.79–0.83; Spiegelhalter P value: 0–.12). Risk concentration captured 47–52% of cases in the top quantiles of predicted probabilities. Discussion Partitioning and ensembling enabled all study data to be used given computational limits and helped mediate case imbalance. Predicting risk at the prescription level can aggregate risk to the patient, provider, pharmacy, county, and regional levels. Implementing these models into Tennessee Department of Health systems might enable more granular risk quantification. Prospective validation with more recent data is needed. Conclusion Predicting opioid-related overdose risk at statewide scales remains difficult and models like these, which required a partnership between an academic institution and state health agency to develop, may complement traditional epidemiological methods of risk identification and inform public health decisions.
Measurement errors are present in many data collection procedures and can harm analyses by biasing estimates. To correct for measurement error, researchers often validate a subsample of records and then incorporate the information learned from this validation sample into estimation. In practice, the validation sample is often selected using simple random sampling (SRS). However, SRS leads to inefficient estimates because it ignores information on the error-prone variables, which can be highly correlated to the unknown truth. Applying and extending ideas from the two-phase sampling literature, we propose optimal and nearly optimal designs for selecting the validation sample in the classical measurement-error framework. We target designs to improve the efficiency of model-based and design-based estimators, and show how the resulting designs compare to each other. Our results suggest that sampling schemes that extract more information from the error-prone data are substantially more efficient than SRS, for both design-and model-based estimators. The optimal procedure, however, depends on the analysis method, and can differ substantially. This is supported by theory and simulations. We illustrate the various designs using data from an HIV cohort study.
Introduction:Audits play a critical role in maintaining the integrity of observational cohort data. While previous work has validated the audit process, sending trained auditors to sites (“travel-audits”) can be costly. We investigate the efficacy of training sites to conduct “self-audits.”Methods:In 2017, eight research groups in the Caribbean, Central, and South America network for HIV Epidemiology each audited a subset of their patient records randomly selected by the data coordinating center at Vanderbilt. Designated investigators at each site compared abstracted research data to the original clinical source documents and captured audit findings electronically. Additionally, two Vanderbilt investigators performed on-site travel-audits at three randomly selected sites (one adult and two pediatric) in late summer 2017.Results:Self- and travel-auditors, respectively, reported that 93% and 92% of 8919 data entries, captured across 28 unique clinical variables on 65 patients, were entered correctly. Across all entries, 8409 (94%) received the same assessment from self- and travel-auditors (7988 correct and 421 incorrect). Of 421 entries mutually assessed as “incorrect,” 304 (82%) were corrected by both self- and travel-auditors and 250 of these (72%) received the same corrections. Reason for changing antiretroviral therapy (ART) regimen, ART end date, viral load value, CD4%, and HIV diagnosis date had the most mismatched corrections.Conclusions:With similar overall error rates, findings suggest that data audits conducted by trained local investigators could provide an alternative to on-site audits by external auditors to ensure continued data quality. However, discrepancies observed between corrections illustrate challenges in determining correct values even with audits.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.