Problem:
Although holistic review has been used successfully in some residency programs to decrease bias, such review is time-consuming and unsustainable for many programs without initial prescreening. The unstructured qualitative data in residency applications, including notable experiences, letters of recommendation, personal statement, and medical student performance evaluations, require extensive time, resources, and metrics to evaluate; therefore, previous applicant screening relied heavily on quantitative metrics, which can be socioeconomically and racially biased.
Approach:
Using residency applications to the University of Utah internal medicine–pediatrics program from 2015 to 2019, the authors extracted relevant snippets of text from the narrative sections of applications. Expert reviewers annotated these snippets into specific values (academic strength; intellectual curiosity; compassion; communication; work ethic; teamwork; leadership; self-awareness; diversity, equity, and inclusion; professionalism; and adaptability) previously identified as associated with resident success. The authors prospectively applied a machine learning model (MLM) to snippets from applications from 2023, and output was compared with a manual holistic review performed without knowledge of MLM results.
Outcomes:
Overall, the MLM had a sensitivity of 0.64, specificity of 0.97, positive predictive value of 0.62, negative predictive value of 0.97, and F1 score of 0.63. The mean (SD) total number of annotations per application was significantly correlated with invited for interview status (invited: 208.6 [59.1]; not invited: 145.2 [57.2]; p < .001). In addition, 8 of the 10 individual values were significantly predictive of an applicant’s invited for interview status.
Next Steps:
The authors created an MLM that can identify several values important for resident success in internal medicine–pediatrics programs with moderate sensitivity and high specificity. The authors will continue to refine the MLM by increasing the number of annotations, exploring parameter tuning and feature engineering options, and identifying which application sections have the highest correlation with invited for interview status.