A Test Can Have Multiple Reliabilities

Ellis, Jules L.

doi:10.1007/s11336-021-09800-2

Cited by 14 publications

(12 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, we think that Cronbach's alpha or its stratified variants offer conceptual advantages because they do not rely on a factor model. In contrast, they are only based on an exchangeability assumption of items (see also [43]), and items can be regarded as random instead of fixed ( [44]; but see also [45]). Unfortunately, it is frequently noted that the computation of Cronbach's alpha would require that a one-dimensional factor model must fit the data [46] without referring to the origins of the exchangeability concept behind Cronbach's alpha.…”

Section: Discussionmentioning

confidence: 99%

Is It Really More Robust? Comparing the Robustness of the Structural After Measurement (SAM) Approach to Structural Equation Modeling (SEM) Against Local Model Misspecifications with Alternative Estimation Approaches

Robitzsch¹

2022

Preprint

View full text Add to dashboard Cite

Structural equation models (SEM) or confirmatory factor analysis as a special case contain model parameters at the measurement part and the structural part. In most social science SEM applications, all parameters are simultaneously estimated in a one-step approach (e.g., with maximum likelihood estimation). In a recent article, Rosseel and Loh (2022, Psychol. Methods) proposed a two-step structural after measurement (SAM) approach to SEM that estimates the parameters of the measurement model in the first step and the parameters of the structural model in the second step. Rosseel and Loh claimed that SAM is more robust to local model misspecifications (i.e., cross-loadings and residual correlations) than one-step maximum likelihood estimation. In this article, it is demonstrated with analytical derivations and simulation studies that SAM is generally not more robust to misspecifications than one-step estimation approaches. Alternative estimation methods are proposed that provide more robustness to misspecifications. However, we argue in the discussion section that applied researchers should nevertheless adopt SAM because robustness to local misspecifications is an irrelevant property when applying SAM. Parameter estimates in a structural model are likely estimated without bias because researchers intentionally fit misspecified SEMs. In contrast, SEMs with some empirically driven model modifications will result in biased estimates of the structural parameters.

show abstract

Section: Discussionmentioning

confidence: 99%

Is It Really More Robust? Comparing the Robustness of the Structural After Measurement (SAM) Approach to Structural Equation Modeling (SEM) Against Local Model Misspecifications with Alternative Estimation Approaches

Robitzsch¹

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The test items should cover the ability domain defined by the test framework (test blueprint; see also Pellegrino & Chudowsky, 2003;Reckase, 2017). It might be legitimate to assume that there exists a larger population of test items (henceforth, labeled by I ) from which the items are chosen in a particular study, and true ability values would be defined as outcomes in a study in which all items from the population would have been chosen (Cronbach & Shavelson, 2004; see also Ellis, 2021, Kane, 1982Brennan, 2001). Interestingly, it has been argued that classical test theory (CTT) or generalizability theory (GT; Cronbach et al, 1963) treats items in a study as random and, as a consequence, allows the inference to a larger set of items in a population of items (see also Nunnally & Bernstein, 1994;Markus & Borsboom, 2013).…”

Section: Design-based or Model-based Inference For Items?mentioning

confidence: 99%

“…such a design-based perspective, no assessment of the model fit for the set of item responses x n is required. For example, the use of Cronbach's alpha (Cronbach, 1951) as a reliability measure for the sum score does not require that a model with equal item loadings and uncorrelated residual errors have to fit the data of item responses (Cronbach, 1951;Cronbach & Shavelson, 2004;Ellis, 2021;Meyer, 2010;Nunnally & Bernstein, 1994;Tryon, 1957). In the same manner, as for persons, resampling methods for items can be used to determine standard errors in estimated abilities (Liou & Yu, 1991;Wainer & Thissen, 1987;Wainer & Wright, 1980) by resampling items or groups of items for which abilities are reestimated (see also Michaelides & Haertel, 2014).…”

Section: Design-based or Model-based Inference For Items?mentioning

confidence: 99%

Some thoughts on analytical choices in the scaling model for test scores in international large-scale assessment studies

Robitzsch

Lüdtke

2022

Meas Instrum Soc Sci

View full text Add to dashboard Cite

International large-scale assessments (LSAs), such as the Programme for International Student Assessment (PISA), provide essential information about the distribution of student proficiencies across a wide range of countries. The repeated assessments of the distributions of these cognitive domains offer policymakers important information for evaluating educational reforms and received considerable attention from the media. Furthermore, the analytical strategies employed in LSAs often define methodological standards for applied researchers in the field. Hence, it is vital to critically reflect on the conceptual foundations of analytical choices in LSA studies. This article discusses the methodological challenges in selecting and specifying the scaling model used to obtain proficiency estimates from the individual student responses in LSA studies. We distinguish design-based inference from model-based inference. It is argued that for the official reporting of LSA results, design-based inference should be preferred because it allows for a clear definition of the target of inference (e.g., country mean achievement) and is less sensitive to specific modeling assumptions. More specifically, we discuss five analytical choices in the specification of the scaling model: (1) specification of the functional form of item response functions, (2) the treatment of local dependencies and multidimensionality, (3) the consideration of test-taking behavior for estimating student ability, and the role of country differential items functioning (DIF) for (4) cross-country comparisons and (5) trend estimation. This article’s primary goal is to stimulate discussion about recently implemented changes and suggested refinements of the scaling models in LSA studies.

show abstract

“…Coefficient alpha has specific definitions of both true scores and error scores in the reliability context. Ellis (2021) reported that coefficient alpha can be differently interpreted using the definition of true scores in three test theories: classical test theory, generalizability theory, and latent trait theory.…”

Section: Reconceptualization Of Coefficient Alpha For Summed Scoresmentioning

confidence: 99%

Reconceptualization of Coefficient Alpha Reliability for Test Summed and Scaled Scores

Almehrizi

2022

Educational Measurement

View full text Add to dashboard Cite

Coefficient alpha reliability persists as the most common reliability coefficient reported in research. The assumptions for its use are, however, not well‐understood. The current paper challenges the commonly used expressions of coefficient alpha and argues that while these expressions are correct when estimating reliability for summed scores, they are not appropriate to extend coefficient alpha to correctly estimate the reliability for nonlinearly transformed scaled scores such as percentile ranks and stanines. The current paper reconceptualizes coefficient alpha as a complement of the ratio of two unbiased estimates of the summed score variance. These include conditional summed score variance assuming uncorrelated item scores (gives the error score variance) and unconditional summed score variance incorporating intercorrelated item scores (gives the observed score variance). Using this reconceptualization, a new equation of coefficient generalized alpha is introduced for scaled scores. Coefficient alpha is a special case of this new equation since the latter reduces to coefficinet alpha if the scaled scores are the summed scores themselves. Two applications (cognitive and psychological assessments) are used to compare the performance (estimation and bootstrap confidence interval) of the reliability coefficients for different scaled scores. Results support the new equation of coefficient generalized alpha and compare it to coefficient generalized beta for parallel test forms. Coefficient generalized alpha produced different reliability values, which were larger than coefficient generalized beta for different scaled scores.

show abstract

A Test Can Have Multiple Reliabilities

Cited by 14 publications

References 25 publications

Is It Really More Robust? Comparing the Robustness of the Structural After Measurement (SAM) Approach to Structural Equation Modeling (SEM) Against Local Model Misspecifications with Alternative Estimation Approaches

Is It Really More Robust? Comparing the Robustness of the Structural After Measurement (SAM) Approach to Structural Equation Modeling (SEM) Against Local Model Misspecifications with Alternative Estimation Approaches

Some thoughts on analytical choices in the scaling model for test scores in international large-scale assessment studies

Reconceptualization of Coefficient Alpha Reliability for Test Summed and Scaled Scores

Contact Info

Product

Resources

About