Objective
The aim of this study was to collect and synthesize evidence regarding data quality problems encountered when working with variables related to social determinants of health (SDoH).
Materials and Methods
We conducted a systematic review of the literature on social determinants research and data quality and then iteratively identified themes in the literature using a content analysis process.
Results
The most commonly represented quality issue associated with SDoH data is plausibility (n = 31, 41%). Factors related to race and ethnicity have the largest body of literature (n = 40, 53%). The first theme, noted in 62% (n = 47) of articles, is that bias or validity issues often result from data quality problems. The most frequently identified validity issue is misclassification bias (n = 23, 30%). The second theme is that many of the articles suggest methods for mitigating the issues resulting from poor social determinants data quality. We grouped these into 5 suggestions: avoid complete case analysis, impute data, rely on multiple sources, use validated software tools, and select addresses thoughtfully.
Discussion
The type of data quality problem varies depending on the variable, and each problem is associated with particular forms of analytical error. Problems encountered with the quality of SDoH data are rarely distributed randomly. Data from Hispanic patients are more prone to issues with plausibility and misclassification than data from other racial/ethnic groups.
Conclusion
Consideration of data quality and evidence-based quality improvement methods may help prevent bias and improve the validity of research conducted with SDoH data.