BackgroundBreast cancer is the most common cancer in women worldwide, with a great diversity in outcomes among individual patients. The ability to accurately predict a breast cancer outcome is important to patients, physicians, researchers, and policy makers. Many models have been developed and tested in different settings. We systematically reviewed the prognostic models developed and/or validated for patients with breast cancer.MethodsWe conducted a systematic search in four electronic databases and some oncology websites, and a manual search in the bibliographies of the included studies. We identified original studies that were published prior to 1st January 2017, and presented the development and/or validation of models based mainly on clinico-pathological factors to predict mortality and/or recurrence in female breast cancer patients.ResultsFrom the 96 articles selected from 4095 citations found, we identified 58 models, which predicted mortality (n = 28), recurrence (n = 23), or both (n = 7). The most frequently used predictors were nodal status (n = 49), tumour size (n = 42), tumour grade (n = 29), age at diagnosis (n = 24), and oestrogen receptor status (n = 21). Models were developed in Europe (n = 25), Asia (n = 13), North America (n = 12), and Australia (n = 1) between 1982 and 2016. Models were validated in the development cohorts (n = 43) and/or independent populations (n = 17), by comparing the predicted outcomes with the observed outcomes (n = 55) and/or with the outcomes estimated by other models (n = 32), or the outcomes estimated by individual prognostic factors (n = 8). The most commonly used methods were: Cox proportional hazards regression for model development (n = 32); the absolute differences between the predicted and observed outcomes (n = 30) for calibration; and C-index/AUC (n = 44) for discrimination.Overall, the models performed well in the development cohorts but less accurately in some independent populations, particularly in patients with high risk and young and elderly patients. An exception is the Nottingham Prognostic Index, which retains its predicting ability in most independent populations.ConclusionsMany prognostic models have been developed for breast cancer, but only a few have been validated widely in different settings. Importantly, their performance was suboptimal in independent populations, particularly in patients with high risk and in young and elderly patients.Electronic supplementary materialThe online version of this article (10.1186/s12885-019-5442-6) contains supplementary material, which is available to authorized users.