The applicability of 5 conventional guidelines for construct measurement is critically examined: (a) Construct indicators should be internally consistent for valid measures, (b) there are optimal magnitudes of correlations between items, (c) the validity of measures depends on the adequacy with which a specified domain is sampled, (d) within-construct correlations must be greater than between-construct correlations, and (e) linear composites of indicators can replace latent variables. A structural equation perspective is used, showing that without an explicit measurement model relating indicators to latent variables and measurement errors, none of these conventional beliefs hold without qualifications. Moreover, a "causal" indicator model is presented that sometimes better corresponds to the relation of indicators to a construct than does the classical test theory "effect" indicator model. Factor analysis (Spearman, 1904) and classical test theory (Lord & Novick, 1968; Spearman, 1910) have influenced perspectives on measurement not only in psychology but in most of the social sciences. These traditions have given rise to criteria to select "good" measures and to a number of beliefs about the way valid and reliable indicators 1 should behave. For instance, Nunnally (1978, p. 102) warned that if correlations among measures are near zero, they measure different things. Some have argued that high correlations are better than low ones (e.g., Horst, 1966, p. 147), whereas others have claimed that moderate correlations are best (Cattell, 1965, p. 88). As the preceding example illustrates, the guidelines to indicator selection are sometimes contradictory. The result is that one can justify keeping or discarding an indicator depending on whose advice is followed. Obviously, this is an undesirable state ofaffairs that suggests that the conventional beliefs on measurement and indicator selection require clarification. We contend that these contradictions can largely be traced to two sources. One is that some items do not conform to the classical test theory or factor analysis models that treat indicators as effects of a construct. We present an alternative model in which indicators influence a construct. Second is the failure to present a measurement model that explicitly shows the assumed relation between constructs, measures, and errors of measurement. We are grateful to Jane Scott-Lennox for her suggestions on several versions of the manuscript and for her help in creating Figures 1 and 2. We also thank Lewis Goldberg, Rick Hoyle, Jeff Tanaka, and Raymond Wolfe for their many helpful suggestions on drafts of this article.
We propose to change the default P-value threshold for statistical significance from 0.05 to 0.005 for claims of new discoveries. T he lack of reproducibility of scientific studies has caused growing concern over the credibility of claims of new discoveries based on 'statistically significant' findings. There has been much progress toward documenting and addressing several causes of this lack of reproducibility (for example, multiple testing, P-hacking, publication bias and under-powered studies). However, we believe that a leading cause of non-reproducibility has not yet been adequately addressed: statistical standards of evidence for claiming new discoveries in many fields of science are simply too low. Associating statistically significant findings with P < 0.05 results in a high rate of false positives even in the absence of other experimental, procedural and reporting problems.For fields where the threshold for defining statistical significance for new discoveries is P < 0.05, we propose a change to P < 0.005. This simple step would immediately improve the reproducibility of scientific research in many fields. Results that would currently be called significant but do not meet the new threshold should instead be called suggestive. While statisticians have known the relative weakness of using P ≈ 0.05 as a threshold for discovery and the proposal to lower it to 0.005 is not new 1,2 , a critical mass of researchers now endorse this change.We restrict our recommendation to claims of discovery of new effects. We do not address the appropriate threshold for confirmatory or contradictory replications of existing claims. We also do not advocate changes to discovery thresholds in fields that have already adopted more stringent standards (for example, genomics and high-energy physics research; see the 'Potential objections' section below).We also restrict our recommendation to studies that conduct null hypothesis significance tests. We have diverse views about how best to improve reproducibility, and many of us believe that other ways of summarizing the data, such as Bayes factors or other posterior summaries based on clearly articulated model assumptions, are preferable to P values. However, changing the P value threshold is simple, aligns with the training undertaken by many researchers, and might quickly achieve broad acceptance.
Assessing overall model fit is an important problem in general structural equation models. One of the most widely used fit measures is Bentler and Bonett's (1980) normed index. This article has three purposes: (1) to propose a new incremental fit measure that provides an adjustment to the normed index for sample size and degrees of freedom, (2) to explain the relation between this new fit measure and the other ones, and (3) to illustrate its properties with an empirical example and a Monte Carlo simulation. The simulation suggests that the mean of the sampling distribution of the new fit measure stays at about one for different sample sizes whereas that for the normed fit index increases with N. In addition, the standard deviation of the new measure is relatively low compared to some other measures (e.g., Tucker and Lewis's (1973) and Bentler and Bonett's (1980) nonnormed index). The empirical example suggests that the new fit measure is relatively stable for the same model in different samples. In sum, it appears that the new incremental measure is a useful complement to the existing fit measures.
Assessing overall fit is a topic of keen interest to structural equation modelers, yet measuring goodness of fit has been hampered by several factors. First, the assumptions that underlie the chi-square tests of model fit often are violated. Second, many fit measures (e.g., Bentler and Bonett's [1980] normed fit index) have unknown statistical distributions so that hypothesis testing, confidence intervals, or comparisons of significant differences in these fit indices are not possible. Finally, modelers have little knowledge about the distribution and behavior of the fit measures for misspecified models or for nonnested models. Given this situation, bootstrapping techniques would appear to be an ideal means to tackle these problems. Indeed, Bentler's (1989) EQS 3.0 and Jöreskog and Sörbom's (forthcoming) LISREL 8 have bootstrap resampling options to bootstrap fit indices. In this article the authors (a) demonstrate that the usual bootstrapping methods will fail when applied to the original data, (b) explain why this occurs, and, (c) propose a modified bootstrap method for the chi-square test statistic for model fit. They include simulated and empirical examples to illustrate their results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.