Background: Aims of this study were to assess construct validity (dimensionality and measurement invariance) and reliability of the previously developed Cognitive module of the Portuguese Physical Literacy Assessment – Questionnaire (PPLA-Q). Secondary aims were to assess whether using distractor information was useful for higher precision, and whether a total sum-score has enough precision for applied PE settings.Methods: Parametric Item Response Theory (IRT) models were estimated using a final sample of 508 Portuguese adolescents (Mage= 16, SD = 1 years) studying in public schools in Lisbon. A retest subsample of 73 students, collected 15 days after baseline, was used to calculate Intraclass Correlation Coefficient (ICC) and Svenson’s ordinal paired agreement. Results: A mixed 2-parameter nested logit + graded response model provided the best fit to the data, C2 (21) = 23.92, p = .21; CFI = .98; RMSEAC2= .017 [0,.043] with no misfitting items. Modelling distractor information provided an increase in available information and thus, reliability. There was evidence of differential item functioning in one item in favor of male students, however it did not translate in statistically significant differences at test level (sDTF = -0.06; sDTF% = -0.14). Average score reliability was low (marginal reliability= .60); while adequate reliability was attained in the -2 to -1 θ range. ICC results suggest poor to moderate test-retest reliability (ICC = .56, [.38, .70]); while Svenson’s method resulted in 6 out of 10 items with acceptable agreement (>.70), and 4 remaining items revealing a small individual variability across time points. We found a high correlation (r = .91 [.90,.93]) among sum-score and scores derived from calibrated mixed model.Conclusions: Evidence supports the construct validity of the cognitive module of the PPLA-Q to assess Content Knowledge in the Portuguese PE context for grade 10-12 (15-18 years) adolescents. This test attainted acceptable reliability for distinguishing student with transitional knowledge (between Foundation and Mastery), with further revisions needed to target full spectrum of θ. Its sum-score might be used in applied settings to get a quick overview of student’s knowledge; for precision IRT score is recommended. Further scrutiny of test-retest reliability is warranted in future research, along with the use of 3-parameter logistic models.