Objective The aim of this study was to collect and synthesize evidence regarding data quality problems encountered when working with variables related to social determinants of health (SDoH). Materials and Methods We conducted a systematic review of the literature on social determinants research and data quality and then iteratively identified themes in the literature using a content analysis process. Results The most commonly represented quality issue associated with SDoH data is plausibility (n = 31, 41%). Factors related to race and ethnicity have the largest body of literature (n = 40, 53%). The first theme, noted in 62% (n = 47) of articles, is that bias or validity issues often result from data quality problems. The most frequently identified validity issue is misclassification bias (n = 23, 30%). The second theme is that many of the articles suggest methods for mitigating the issues resulting from poor social determinants data quality. We grouped these into 5 suggestions: avoid complete case analysis, impute data, rely on multiple sources, use validated software tools, and select addresses thoughtfully. Discussion The type of data quality problem varies depending on the variable, and each problem is associated with particular forms of analytical error. Problems encountered with the quality of SDoH data are rarely distributed randomly. Data from Hispanic patients are more prone to issues with plausibility and misclassification than data from other racial/ethnic groups. Conclusion Consideration of data quality and evidence-based quality improvement methods may help prevent bias and improve the validity of research conducted with SDoH data.
Context: Although screening recommendations for prostate cancer using prostate-specific antigen testing often include shared decision making, the effect of patient decision aids on patients’ intention and uptake is unclear. This study aimed to review the effect of decision aids on men’s screening intention, screening utilization, and the congruence between intentions and uptake. Evidence acquisition: Data sources were searched until April 6, 2018, and included MEDLINE, Scopus, CENTRAL, CT.gov, Cochrane report, PsycARTICLES, PsycINFO, and reference lists. This study included RCTs and observational studies of decision aids that measured prostate screening intention or behavior. The analysis was completed in April 2018. Evidence synthesis: Eighteen studies (13 RCTs, four before–after studies, one non-RCT) reported data on screening intention for ≅8,400 men and screening uptake for 2,385 men. Compared with usual care, the use of decision aids in any format results in fewer men (aged ≥40 years) planning to undergo prostate-specific antigen testing (risk ratio=0.88, 95% CI=0.81, 0.95, p=0.006, I2=66%, p<0.001, n=8). Many men did not follow their screening intentions during the first year after using a decision aid; however, most men who were planning to undergo screening did so (probability that men who wanted to be screened would receive screening was 95%). Conclusions: Integration of decision aids in clinical practice may result in a decrease in the number of men who elect prostate-specific antigen testing, which may in turn reduce screening uptake. To ensure high congruence between intention and screening utilization, providers should not delay the shared decision-making discussion after patients use a decision aid.
Background The adverse impact of COVID-19 on marginalized and under-resourced communities of color has highlighted the need for accurate, comprehensive race and ethnicity data. However, a significant technical challenge related to integrating race and ethnicity data in large, consolidated databases is the lack of consistency in how data about race and ethnicity are collected and structured by health care organizations. Objective This study aims to evaluate and describe variations in how health care systems collect and report information about the race and ethnicity of their patients and to assess how well these data are integrated when aggregated into a large clinical database. Methods At the time of our analysis, the National COVID Cohort Collaborative (N3C) Data Enclave contained records from 6.5 million patients contributed by 56 health care institutions. We quantified the variability in the harmonized race and ethnicity data in the N3C Data Enclave by analyzing the conformance to health care standards for such data. We conducted a descriptive analysis by comparing the harmonized data available for research purposes in the database to the original source data contributed by health care institutions. To make the comparison, we tabulated the original source codes, enumerating how many patients had been reported with each encoded value and how many distinct ways each category was reported. The nonconforming data were also cross tabulated by 3 factors: patient ethnicity, the number of data partners using each code, and which data models utilized those particular encodings. For the nonconforming data, we used an inductive approach to sort the source encodings into categories. For example, values such as “Declined” were grouped with “Refused,” and “Multiple Race” was grouped with “Two or more races” and “Multiracial.” Results “No matching concept” was the second largest harmonized concept used by the N3C to describe the race of patients in their database. In addition, 20.7% of the race data did not conform to the standard; the largest category was data that were missing. Hispanic or Latino patients were overrepresented in the nonconforming racial data, and data from American Indian or Alaska Native patients were obscured. Although only a small proportion of the source data had not been mapped to the correct concepts (0.6%), Black or African American and Hispanic/Latino patients were overrepresented in this category. Conclusions Differences in how race and ethnicity data are conceptualized and encoded by health care institutions can affect the quality of the data in aggregated clinical databases. The impact of data quality issues in the N3C Data Enclave was not equal across all races and ethnicities, which has the potential to introduce bias in analyses and conclusions drawn from these data. Transparency about how data have been transformed can help users make accurate analyses and inferences and eventually better guide clinical care and public policy.
BACKGROUND A significant technical challenge related to integrating race and ethnicity data across EHR systems is the lack of consistency in how data about race and ethnicity is collected and structured by healthcare organizations. OBJECTIVE To evaluate and describe variations in how healthcare systems collect and report information about the race and ethnicity of their patients, and how these data are integrated when it is aggregated into a large clinical database. METHODS At the time of our analysis, the National COVID Cohort Collaborative (N3C) Data Enclave contained records from 6.5 million patients contributed by 56 healthcare institutions. We assessed the quality of race and ethnicity data by analyzing its conformance to federal standards, then drilled into the non-conforming data. RESULTS “No matching category” was the second largest harmonized racial group in the N3C. 20.7% of the race data did not conform to the federal standard; the largest category was data that were missing. Hispanic or Latino patients were over-represented in the non-conforming racial data, and data from American Indian or Alaska Native patients were obscured. Although only a small proportion of the source data had not been mapped to the correct concepts (0.6%), Black or African-American and Hispanic/Latino patients were over-represented in this category. CONCLUSIONS The impact of data quality issues was not equal across all races and ethnicities, which has the potential to introduce bias in analyses and conclusions drawn from these data.The adverse impact of COVID-19 on marginalized and under-resourced communities of color has highlighted the need for accurate, comprehensive race and ethnicity data. Differences in how race and ethnicity data is conceptualized and encoded by healthcare institutions can affect the quality of the data in aggregated clinical databases. Transparency about how data has been transformed can help users make accurate analyses and inferences, and eventually better guide clinical care and public policy.
PurposeBecause pulmonary exacerbations in cystic fibrosis cause a step-wise decline in FEV1 function and contribute significantly to disease progression, it is important to identify potential environmental triggers. Studies have been done on general air quality and its relationship to cystic fibrosis disease activity, but none have examined air pollution caused by wildfire smoke. Our study intends to better understand this relationship. MethodsA retrospective cohort study was conducted using data collected from people with cystic fibrosis (CF) between 2012 and 2019. Data on pulmonary exacerbations was extracted from the patient registry hosted and maintained by the Cystic Fibrosis Foundation. Exposures were determined using measurements of fine particulate matter (PM2.5) from the Environmental Protection Agency. A logistic regression model was created in order to identify both univariate and adjusted odds ratios and their associated confidence intervals.Results82.7% (n = 415) of individuals with CF experienced an exposure to wildfire smoke during the study period. The adjusted odds ratio for a pulmonary exacerbation within one month following an exposure to wildfire smoke was 1.50 (95% CI = 1.13 – 1.99, p = 0.006) for adults and 0.92 (95% CI = 0.69 – 1.23, p = 0.578) for children. ConclusionWildfire smoke exposure is associated with severe pulmonary exacerbation in adults but not in children. This suggests that wildfire smoke may be an environmental risk factor for exacerbation in adults with CF. Further study is needed to understand why and how wildfire smoke exposure affects adult with CF differently than the pediatric population.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.