BackgroundThe Gail model has been widely used and validated with conflicting results. The current study aims to evaluate the performance of different versions of the Gail model by means of systematic review and meta-analysis with trial sequential analysis (TSA).MethodsThree systematic review and meta-analyses were conducted. Pooled expected-to-observed (E/O) ratio and pooled area under the curve (AUC) were calculated using the DerSimonian and Laird random-effects model. Pooled sensitivity, specificity and diagnostic odds ratio were evaluated by bivariate mixed-effects model. TSA was also conducted to determine whether the evidence was sufficient and conclusive.ResultsGail model 1 accurately predicted breast cancer risk in American women (pooled E/O = 1.03; 95% CI 0.76–1.40). The pooled E/O ratios of Caucasian-American Gail model 2 in American, European and Asian women were 0.98 (95% CI 0.91–1.06), 1.07 (95% CI 0.66–1.74) and 2.29 (95% CI 1.95–2.68), respectively. Additionally, Asian-American Gail model 2 overestimated the risk for Asian women about two times (pooled E/O = 1.82; 95% CI 1.31–2.51). TSA showed that evidence in Asian women was sufficient; nonetheless, the results in American and European women need further verification.The pooled AUCs for Gail model 1 in American and European women and Asian females were 0.55 (95% CI 0.53–0.56) and 0.75 (95% CI 0.63–0.88), respectively, and the pooled AUCs of Caucasian-American Gail model 2 for American, Asian and European females were 0.61 (95% CI 0.59–0.63), 0.55 (95% CI 0.52–0.58) and 0.58 (95% CI 0.55–0.62), respectively.The pooled sensitivity, specificity and diagnostic odds ratio of Gail model 1 were 0.63 (95% CI 0.27–0.89), 0.91 (95% CI 0.87–0.94) and 17.38 (95% CI 2.66–113.70), respectively, and the corresponding indexes of Gail model 2 were 0.35 (95% CI 0.17–0.59), 0.86 (95% CI 0.76–0.92) and 3.38 (95% CI 1.40–8.17), respectively.ConclusionsThe Gail model was more accurate in predicting the incidence of breast cancer in American and European females, while far less useful for individual-level risk prediction. Moreover, the Gail model may overestimate the risk in Asian women and the results were further validated by TSA, which is an addition to the three previous systematic review and meta-analyses.Trial registrationPROSPERO CRD42016047215.Electronic supplementary materialThe online version of this article (10.1186/s13058-018-0947-5) contains supplementary material, which is available to authorized users.