OBJECTIVE:
To develop and validate a ChatGPT-powered tool for assessing letters of recommendation in obstetrics and gynecology residency applicants.
METHODS:
This study was conducted in two phases at a university-based obstetrics and gynecology residency program. During the development phase, 30 letters of recommendation were used to create ChatGPT prompts based on a validated scoring rubric that assesses phrases, letter features, and applicant abilities. ChatGPT scores were compared with benchmarks using intraclass correlation coefficients (ICCs) to evaluate reliability, and ChatGPT processing times were measured against those of human reviewers. In the second phase (validation), 88 letters of recommendation from residents with and without academic deficiencies (n=30) were analyzed using the ChatGPT prompts developed and refined in the first stage. Residents with academic deficiencies (case group) were matched 1:1 with a control group of residents without noted deficiencies by medical school, gender, and training period. The letter of recommendation scores between the case and control groups then were compared using the Mann-Whitney U test.
RESULTS:
ChatGPT significantly reduced review time per letter, from 7.2 minutes to 45 seconds. Inter-rater reliability between ChatGPT and benchmark scores indicated strong agreement for specific phrases (ICC 0.82, 95% CI, 0.75–0.88), moderate agreement for letter features (ICC 0.75, 95% CI, 0.68–0.82), and fair agreement for applicant abilities (ICC 0.70, 95% CI, 0.62–0.78). After refinement, ICCs improved to 0.92 (95% CI, 0.88–0.95), 0.81 (95% CI, 0.74–0.87), and 0.77 (95% CI, 0.69–0.84). Mean letter of recommendation scores for specific phrases differed significantly between residents with deficiencies (4.49; 95% CI, 3.29–5.68) and residents in the control group (6.00; 95% CI, 5.00–7.00) (P=.017). Subgroup analysis showed an even greater gap between residents with multiple deficiencies (2.15; 95% CI, 1.42–3.38) and residents in the control group (6.00; 95% CI, 5.00–7.00; P<.001), suggesting a higher predictive value for at-risk residents.
CONCLUSION:
ChatGPT can enhance the residency selection process by providing reliable letter of recommendation analysis significantly faster than human reviewers and effectively identifying key phrases in letters of recommendation that could predict future academic deficiencies.