Introduction: Simulation tools to assess prehospital team performance and identify patient safety events are lacking. We adapted a simulation model and checklist tool of individual paramedic performance to assess prehospital team performance and tested interrater reliability. Methods: We used a modified Delphi process to adapt 3 simulation cases (cardiopulmonary arrest, seizure, asthma) and checklist to add remote physician direction, target infants, and evaluate teams of 2 paramedics and 1 physician. Team performance was assessed with a checklist of steps scored as complete/incomplete by raters using direct observation or video review. The composite performance score was the percentage of completed steps. Interrater percent agreement was compared with the original tool. The tool was modified, and raters trained in iterative rounds until composite performance scoring agreement was 0.80 or greater (scale <0.20 = poor; 0.21-0.39 = fair, 0.40-0.59 = moderate; 0.60-0.79 = good; 0.80-1.00 = very good). Results: We achieved very good interrater agreement for scoring composite performance in 2 rounds using 6 prehospital teams and 4 raters. The original 175 step tool was modified to 171 steps. Interrater percent agreement for the final modified tool approximated the original tool for the composite checklist (0.80 vs. 0.85), cardiopulmonary arrest (0.82 vs. 0.86), and asthma cases (0.80 vs. 0.77) but was lower for the seizure case (0.76 vs. 0.91). Most checklist items (137/171, 80%) had good-very good agreement. Among 34 items with fairmoderate agreement, 15 (44%) related to patient assessment, 9 (26%) equipment use, 6 (18%) medication delivery, and 4 (12%) cardiopulmonary resuscitation quality. Conclusions: The modified checklist has very good agreement for assessing composite prehospital team performance and can be used to test effects of patient safety interventions.