“…Within the context of CAT, discrepancies between estimated and true item difficulty are summarized as coming from four main sources: (1) random error due to sampling of test takers, that is, standard error (SE), which tends to be higher in CAT because of higher item turnover and smaller sample sizes; (2) differences across person groups, that is, DIF; (3) testlet effects, a type of context effect that can arise with item sets that share a common stimulus; and (4) automatic item generation or cloning, where variability in item parameters may be ignored for items generated from the same template. In a series of CAT simulation studies, Doebler () manipulated the first and fourth of these sources, with results demonstrating varying amounts of person parameter bias across different IRT models, estimators, test lengths, and item pool sizes. When item difficulty SE was simulated to be .25, mean bias in person ability exceeded .50 logits under some conditions.…”