BackgroundTeamwork in the operating theatre is becoming increasingly recognized as a major factor in clinical outcomes. Many tools have been developed to measure teamwork. Most fall into two categories: self‐assessment by theatre staff and assessment by observers. A critical and comparative analysis of the validity and reliability of these tools is lacking.MethodsMEDLINE and Embase databases were searched following PRISMA guidelines. Content validity was assessed using measurements of inter‐rater agreement, predictive validity and multisite reliability, and interobserver reliability using statistical measures of inter‐rater agreement and reliability. Quantitative meta‐analysis was deemed unsuitable.ResultsForty‐eight articles were selected for final inclusion; self‐assessment tools were used in 18 and observational tools in 28, and there were two qualitative studies. Self‐assessment of teamwork by profession varied with the profession of the assessor. The most robust self‐assessment tool was the Safety Attitudes Questionnaire (SAQ), although this failed to demonstrate multisite reliability. The most robust observational tool was the Non‐Technical Skills (NOTECHS) system, which demonstrated both test–retest reliability (P > 0·09) and interobserver reliability (Rwg = 0·96).ConclusionSelf‐assessment of teamwork by the theatre team was influenced by professional differences. Observational tools, when used by trained observers, circumvented this.