In 2013, U.S. general surgery residency programs implemented a milestones assessment framework in an effort to incorporate more competency-focused evaluation methods. Developed by a group of surgical education leaders and other stakeholders working with the Accreditation Council for Graduate Medical Education and recently updated in a version 2.0, the surgery milestones framework is centered around 6 “core competencies”: patient care, medical knowledge, practice-based learning and improvement, interpersonal and communication skills, professionalism, and systems-based practice. While prior work has focused on the validity of milestones as a measure of resident performance, associations between general surgery resident milestone ratings and their post-training patient outcomes have only recently been explored in an analysis in this issue of Academic Medicine by Kendrick et al. Despite their well-designed efforts to tackle this complex problem, no relationships were identified. This accompanying commentary discusses the broader implications for the use of milestone ratings beyond their intended application, alternative assessment methods, and the challenges of developing predictive assessments in the complex setting of surgical care. Although milestone ratings have not been shown to provide the specificity needed to predict clinical outcomes in the complex settings studied by Kendrick et al, hope remains that utilization of other outcomes, assessment frameworks, and data analytic tools could augment these models and further our progress toward a predictive assessment in surgical education. Evaluation of residents in general surgery residency programs has grown both more sophisticated and complicated in the setting of increasing patient and case complexity, constraints on time, and regulation of resident supervision in the operating room. Over the last decade, surgical education research efforts related to resident assessment have focused on measuring performance through accurate and reproducible methods with evidence for their validity, as well as on attempting to refine decision making about resident preparedness for unsupervised practice.