Coefficient Alpha and the Reliability of Composite Measurements

The primary purpose of this study is to investigate the mathematical characteristics of the test reliability coefficient q XX 0 as a function of item response theory (IRT) parameters and present the lower and upper bounds of the coefficient. Another purpose is to examine relative performances of the IRT reliability statistics and two classical test theory (CTT) reliability statistics (Cronbach's alpha and Feldt-Gilmer congeneric coefficients) under various testing conditions that result from manipulating large-scale real data. For the first purpose, two alternative ways of exactly quantifying q XX 0 are compared in terms of computational efficiency and statistical usefulness. In addition, the lower and upper bounds for q XX 0 are presented in line with the assumptions of essential tau-equivalence and congeneric similarity, respectively. Empirical studies conducted for the second purpose showed across all testing conditions that (1) the IRT reliability coefficient was higher than the CTT reliability statistics; (2) the IRT reliability coefficient was closer to the Feldt-Gilmer coefficient than to the Cronbach's alpha coefficient; and (3) the alpha coefficient was close to the lower bound of IRT reliability. Some advantages of the IRT approach to estimating test-score reliability over the CTT approaches are discussed in the end.Keywords Test reliability Á Item response theory (IRT) Á Lower and upper bounds of reliability coefficient Á Test score metric versus ability score metric Even when a test form is developed using item response theory (IRT), practitioners often use the test score (X) metric as in classical test theory (CTT) rather than the ability score (h) metric as a basis for reporting examinees' scores. This preference is related to the practical problem that the h metric is often not easily understood by examinees. In this paper, the test score metric refers to the one that is developed by summing item scores, regardless of whether dichotomous or polytomous.Although not all assumed this context, many studies (e.g., Dimitrov 2003;Kolen et al. 1996; Lord 1980, p. 52;May and Nicewander 1994;Shojima and Toyoda 2002) have presented formulas for computing the IRT counterpart of test score reliability in CTT as a function of known item parameters and some distribution of ability. However, as described in detail later, the formulas shown for quantifying IRT-based test score reliability (simply referred to as IRT test reliability) are not based on the same assumptions and thus do not lead to the same reliability coefficient. In fact, some of the formulas (e.g., Kolen et al. 1996;May and Nicewander 1994) are for exactly quantifying the IRT test reliability coefficient (denoted q XX 0) and others (e.g., Dimitrov 2003;Shojima and Toyoda 2002) are for approximating the exact coefficient.The primary purpose of this study is to investigate the mathematical characteristics of the IRT test reliability coefficient and present its lower and upper bounds. For this, two alternative ways of exactly quantifying q XX 0 are comp...

show abstract

“…18 is in form identical to the classical equation for Cronbach's alpha. Thus, as well documented in the literature (e.g., Novick and Lewis 1967),…”

Section: The Lower and Upper Bounds Of The Irt Test Reliability Coeffsupporting

confidence: 66%

The estimation of the IRT reliability coefficient and its lower and upper bounds, with comparisons to CTT reliability statistics

Kim

Feldt

2010

Asia Pacific Educ. Rev.

View full text Add to dashboard Cite

show abstract

“…Though the Case IV result that ≤ might appear to contradict the proof originally published by Novick and Lewis (1967) that is a lower bound of reliability with equality holding under conditions of essential tau equivalence, it must be remembered that our reflects a different definition of true score than that used by Novick and Lewis. The Case IV result that = does accord with the Novick and Lewis proof.…”

Section: Case I: Multidimensional Unequal General Factor Loadingsmentioning

confidence: 60%

“…Whereas Lord and Novick (1968) and Novick and Lewis (1967) compared with reliability, they did not consider and h as we have done. McDonald (1970McDonald ( , 1999 considered the relation between and h only and did not consider conditions of multidimensionality as we have done.…”

Section: Introductionmentioning

confidence: 88%

Cronbach’s α, Revelle’s β, and Mcdonald’s ωH: their relations with each other and two alternative conceptualizations of reliability

et al. 2005

View full text Add to dashboard Cite

northwestern university, the family institute at northwestern university William Revelle northwestern university Iftah Yovel northwestern university Wen Li northwestern universityWe make theoretical comparisons among five coefficients -Cronbach's , Revelle's , McDonald's h , and two alternative conceptualizations of reliability. Though many end users and psychometricians alike may not distinguish among these five coefficients, we demonstrate formally their nonequivalence. Specifically, whereas there are conditions under which , , and h are equivalent to each other and to one of the two conceptualizations of reliability considered here, we show that equality with this conceptualization of reliability and between and h holds only under a highly restrictive set of conditions and that the conditions under which equals h are only somewhat more general. The nonequivalence of , , and h suggests that important information about the psychometric properties of a scale may be missing when scale developers and users only report as is almost always the case.

show abstract

“…The total score of the Emotional Functioning subscales for parent proxy-report, Social Functioning, Fatigue, Speech and Communication subscales for selfreport did not approach or exceed 0.70. Although Cronbach's alpha represents the lower boundary of the reliability of a measurement instrument and is a conservative estimate of actual reliability [34], scales that did not achieve the 0.70 standard should be used only for descriptive analyses.…”

Section: Discussionmentioning

confidence: 99%

The PedsQL in pediatric cerebral palsy: reliability and validity of the Chinese version pediatric quality of life inventory 4.0 generic core scales and 3.0 cerebral palsy module

2010

View full text Add to dashboard Cite

show abstract

Coefficient Alpha and the Reliability of Composite Measurements

Cited by 202 publications

References 12 publications

The estimation of the IRT reliability coefficient and its lower and upper bounds, with comparisons to CTT reliability statistics

The estimation of the IRT reliability coefficient and its lower and upper bounds, with comparisons to CTT reliability statistics

Cronbach’s α, Revelle’s β, and Mcdonald’s ωH: their relations with each other and two alternative conceptualizations of reliability

The PedsQL in pediatric cerebral palsy: reliability and validity of the Chinese version pediatric quality of life inventory 4.0 generic core scales and 3.0 cerebral palsy module

Contact Info

Product

Resources

About