The primary purpose of this study is to investigate the mathematical characteristics of the test reliability coefficient q XX 0 as a function of item response theory (IRT) parameters and present the lower and upper bounds of the coefficient. Another purpose is to examine relative performances of the IRT reliability statistics and two classical test theory (CTT) reliability statistics (Cronbach's alpha and Feldt-Gilmer congeneric coefficients) under various testing conditions that result from manipulating large-scale real data. For the first purpose, two alternative ways of exactly quantifying q XX 0 are compared in terms of computational efficiency and statistical usefulness. In addition, the lower and upper bounds for q XX 0 are presented in line with the assumptions of essential tau-equivalence and congeneric similarity, respectively. Empirical studies conducted for the second purpose showed across all testing conditions that (1) the IRT reliability coefficient was higher than the CTT reliability statistics; (2) the IRT reliability coefficient was closer to the Feldt-Gilmer coefficient than to the Cronbach's alpha coefficient; and (3) the alpha coefficient was close to the lower bound of IRT reliability. Some advantages of the IRT approach to estimating test-score reliability over the CTT approaches are discussed in the end.Keywords Test reliability Á Item response theory (IRT) Á Lower and upper bounds of reliability coefficient Á Test score metric versus ability score metric Even when a test form is developed using item response theory (IRT), practitioners often use the test score (X) metric as in classical test theory (CTT) rather than the ability score (h) metric as a basis for reporting examinees' scores. This preference is related to the practical problem that the h metric is often not easily understood by examinees. In this paper, the test score metric refers to the one that is developed by summing item scores, regardless of whether dichotomous or polytomous.Although not all assumed this context, many studies (e.g., Dimitrov 2003;Kolen et al. 1996; Lord 1980, p. 52;May and Nicewander 1994;Shojima and Toyoda 2002) have presented formulas for computing the IRT counterpart of test score reliability in CTT as a function of known item parameters and some distribution of ability. However, as described in detail later, the formulas shown for quantifying IRT-based test score reliability (simply referred to as IRT test reliability) are not based on the same assumptions and thus do not lead to the same reliability coefficient. In fact, some of the formulas (e.g., Kolen et al. 1996;May and Nicewander 1994) are for exactly quantifying the IRT test reliability coefficient (denoted q XX 0) and others (e.g., Dimitrov 2003;Shojima and Toyoda 2002) are for approximating the exact coefficient.The primary purpose of this study is to investigate the mathematical characteristics of the IRT test reliability coefficient and present its lower and upper bounds. For this, two alternative ways of exactly quantifying q XX 0 are comp...