“…The hypotheses were primarily correlations to other PROMs, e.g., Cincinnati score, the Lysholm score and the WOMAC score (Supplementary File 2). Of these, five studies had scores of “good” [7, 8, 20, 29, 50], and five studies had scores of “fair” [3, 8, 9, 27, 28, 38], mainly due to “vague hypothesis” or “moderate sample size.” Six studies had scores of “poor” for the methodological quality [1, 14, 21, 22, 42, 43] due to “no description of the constructs measured by the comparator instrument,” “no measurement properties reported on comparator instrument,” “unclear what was expected from hypothesis,” or “unclear what was expected regarding correlations or differences.” A positive rating for the results was given when 75 % of the hypotheses were confirmed [44]. There were more than 75 % confirmed hypotheses in nine studies (of which two had the methodological score of “good,” four of “fair” and three of “poor”) [1, 3, 9, 14, 20, 27, 29, 42], and fewer than 75 % confirmed hypotheses in four studies (of which three had the methodological score of “good” and one of “fair”) [7, 8, 28, 50].…”