BackgroundDelivering efficient and effective healthcare is crucial for a condition as burdensome as low back pain (LBP). Stratified care strategies may be worthwhile, but rely on early and accurate patient screening using a valid and reliable instrument. The purpose of this study was to evaluate the performance of LBP screening instruments for determining risk of poor outcome in adults with LBP of less than 3 months duration.MethodsMedline, Embase, CINAHL, PsycINFO, PEDro, Web of Science, SciVerse SCOPUS, and Cochrane Central Register of Controlled Trials were searched from June 2014 to March 2016. Prospective cohort studies involving patients with acute and subacute LBP were included. Studies administered a prognostic screening instrument at inception and reported outcomes at least 12 weeks after screening. Two independent reviewers extracted relevant data using a standardised spreadsheet. We defined poor outcome for pain to be ≥ 3 on an 11-point numeric rating scale and poor outcome for disability to be scores of ≥ 30% disabled (on the study authors' chosen disability outcome measure).ResultsWe identified 18 eligible studies investigating seven instruments. Five studies investigated the STarT Back Tool: performance for discriminating pain outcomes at follow-up was ‘non-informative’ (pooled AUC = 0.59 (0.55–0.63), n = 1153) and ‘acceptable’ for discriminating disability outcomes (pooled AUC = 0.74 (0.66–0.82), n = 821). Seven studies investigated the Orebro Musculoskeletal Pain Screening Questionnaire: performance was ‘poor’ for discriminating pain outcomes (pooled AUC = 0.69 (0.62–0.76), n = 360), ‘acceptable’ for disability outcomes (pooled AUC = 0.75 (0.69–0.82), n = 512), and ‘excellent’ for absenteeism outcomes (pooled AUC = 0.83 (0.75–0.90), n = 243). Two studies investigated the Vermont Disability Prediction Questionnaire and four further instruments were investigated in single studies only.ConclusionsLBP screening instruments administered in primary care perform poorly at assigning higher risk scores to individuals who develop chronic pain than to those who do not. Risks of a poor disability outcome and prolonged absenteeism are likely to be estimated with greater accuracy. It is important that clinicians who use screening tools to obtain prognostic information consider the potential for misclassification of patient risk and its consequences for care decisions based on screening. However, it needs to be acknowledged that the outcomes on which we evaluated these screening instruments in some cases had a different threshold, outcome, and time period than those they were designed to predict.Systematic review registrationPROSPERO international prospective register of systematic reviews registration number CRD42015015778.Electronic supplementary materialThe online version of this article (doi:10.1186/s12916-016-0774-4) contains supplementary material, which is available to authorized users.