Lee, Walsh, and Wang (2015)based on Uzzi, Mukherjee, Stringer, andJones (2013)andWang, Veugelers, andStephan (2017) proposed scores based on cited references (cited journals) data which can be used to measure the novelty of papers (named as novelty scores U and W in this study). Although previous research has used novelty scores in various empirical analyses, no study has been published up to nowto the best of our knowledgewhich quantitatively tested the convergent validity of novelty scores: do these scores measure what they propose to measure? Using novelty assessments by faculty members (FMs) atF1000Prime for comparison, we tested the convergent validity of the two novelty scores (U and W). FMs' assessments not only refer to the quality of biomedical papers, but also to their characteristics (by assigning certain tags to the papers): for example, are the presented findings or formulated hypotheses novel (tags "new findings" and "hypothesis")? We used these and other tags to investigate the convergent validity of both novelty scores. Our study reveals different results for the novelty scores: the results for novelty score U are mostly in agreement with previously formulated expectations. We found, for instance, that for a standard deviation (one unit) increase in novelty score U, the expected number of assignments of the "new finding" tag increase by 7.47%. The results for novelty score W, however, do not reflect convergent validity with the FMs' assessments: only the results for some tags are in agreement with the expectations. Thus, we proposebased on our resultsthe use of novelty score U for measuring novelty quantitatively, but question the use of novelty score W.