Objective. To evaluate the performance and applicability of multivariable prediction models for osteoarthritis (OA). Methods. This was a systematic review and narrative synthesis using 3 databases (EMBASE, PubMed, and Web of Science) from inception to December 2021. We included general population longitudinal studies reporting derivation, comparison, or validation of multivariable models to predict individual risk of OA incidence, defined by recognized clinical or imaging criteria. We excluded studies reporting prevalent OA and joint arthroplasty outcome. Paired reviewers independently performed article selection, data extraction, and risk-of-bias assessment. Model performance, calibration, and retained predictors were summarized.Results. A total of 26 studies were included, reporting 31 final multivariable prediction models for incident knee (23), hip (4), hand (3) and any-site OA (1), with a median of 121.5 (range 27-12,803) outcome events, a median prediction horizon of 8 years (range 2-41), and a median of 6 predictors (range 3-24). Age, body mass index, previous injury, and occupational exposures were among the most commonly included predictors. Model discrimination after validation was generally acceptable to excellent (area under the curve = 0.70-0.85). Either internal or external validation processes were used in most models, although the risk of bias was often judged to be high with limited applicability to mass application in diverse populations.Conclusion. Despite growing interest in multivariable prediction models for incident OA, focus remains predominantly on the knee, with reliance on data from a small pool of appropriate cohort data sets, and concerns over general population applicability.