“…The test items should cover the ability domain defined by the test framework (test blueprint; see also Pellegrino & Chudowsky, 2003;Reckase, 2017). It might be legitimate to assume that there exists a larger population of test items (henceforth, labeled by I ) from which the items are chosen in a particular study, and true ability values would be defined as outcomes in a study in which all items from the population would have been chosen (Cronbach & Shavelson, 2004; see also Ellis, 2021, Kane, 1982Brennan, 2001). Interestingly, it has been argued that classical test theory (CTT) or generalizability theory (GT; Cronbach et al, 1963) treats items in a study as random and, as a consequence, allows the inference to a larger set of items in a population of items (see also Nunnally & Bernstein, 1994;Markus & Borsboom, 2013).…”