Background Work-hour restrictions and fatigue management strategies in surgical training programs continue to evolve in an effort to improve the learning environment and promote safer patient care. In response, training programs must reevaluate how various teaching modalities such as simulation can augment the development of surgical competence in trainees. For surgical simulators to be most useful, it is important to determine whether surgical proficiency can be reliably differentiated using them. To our knowledge, performance on both virtual and benchtop arthroscopy simulators has not been concurrently assessed in the same subjects. Questions/purposes (1) Do global rating scales and procedure time differentiate arthroscopic expertise in virtual and benchtop knee models? (2) Can commercially available built-in motion analysis metrics differentiate arthroscopic expertise? (3) How well are performance measures on virtual and benchtop simulators correlated? (4) Are these metrics sensitive enough to differentiate by year of training? Methods A cross-sectional study of 19 subjects (four medical students, 12 residents, and three staff) were recruited and divided into 11 novice arthroscopists (student to Postgraduate Year [PGY] 3) and eight proficient arthroscopists (PGY 4 to staff) who completed a diagnostic arthroscopy and loose-body retrieval in both virtual and benchtop knee models. Global rating scales (GRS), procedure times, and motion analysis metrics were used to evaluate performance. CI, seconds, p = 0.002). The built-in motion analysis metrics also distinguished novices from proficient arthroscopists using the self-generated virtual loose body retrieval task scores (4 ± 1 [95% CI, 3-5] versus 6 ± 1 [95% CI, 5-7], p = 0.001). GRS scores between virtual and benchtop models were very strongly correlated (q = 0.93, p \ 0.001). There was strong correlation between year of training and virtual GRS (q = 0.8, p \ 0.001) and benchtop GRS (q = 0.87, p \ 0.001) scores. Conclusions To our knowledge, this is the first study to evaluate performance on both virtual and benchtop knee simulators. We have shown that subjective GRS scores and objective motion analysis metrics and procedure time are valid measures to distinguish arthroscopic skill on both virtual and benchtop modalities. Performance on both modalities is well correlated. We believe that training on artificial models allows acquisition of skills in a safe environment. Future work should compare different modalities in the efficiency of skill acquisition, retention, and transferability to the operating room.123 Clin Orthop Relat Res (2016) 474:956-964 DOI 10.1007/s11999-015-4510-8 Clinical Orthopaedics and Related Research ® A Publication of The Association of Bone and Joint Surgeons® GRS scales. The proficient subjects completed nearly all tasks faster