Two unresolved implementation issues with logistic regression (LR) for differential item functioning (DIF) detection include ability purification and effect size use. Purification is suggested to control inaccuracies in DIF detection as a result of DIF items in the ability estimate. Additionally, effect size use may be beneficial in controlling Type I error rates. The effectiveness of such controls, especially used in combination, requires evaluation. Detection errors were evaluated through simulation across iterative purification and no purification procedures with and without the use of an effect size criterion. Sample size, DIF magnitude and percentage, and ability differences were manipulated. Purification was beneficial under certain conditions, although overall power and Type I error rates did not substantially improve. The LR statistical test without purification performed as well as other classification criteria and may be the practical choice for many situations. Continued evaluation of the effect size guidelines and purification are discussed.
The Wechsler Intelligence Scale for Children–Third Edition (WISC-III) is the most widely used test of intelligence in the world. However, the manual for the WISC-III provides insufficient detail regarding the detection of differential item functioning (DIF). The WISC-III national standardization sample ( N = 2,200) was used to investigate DIF in six WISC-III subtests. After fitting two parameter logistic and graded response models to the data, items were tested for DIF using the item response theory likelihood ratio DIF detection method. Of the 151 items studied, 52 were found to function differently across groups. The magnitude of the DIF was also considered by examining (a) parameter differences between groups and (b) root mean squared probability differences. Because the scores of boys and girls may be composed of different items systematically scored as correct, their IQs cannot be assumed to have the same meaning. Further investigations of item content bias are recommended.
This chapter addresses cognitive assessment of deaf children and adults. Emphasis is placed on the psychometric properties (e.g., reliability, validity, norms, item analysis) of published intelligence tests when administered to this population. The use of intelligence tests with deaf people has a long history that can be traced back to the early years of formal intelligence testing aimed at identifying those students in need of special education due to “mental retardation.” Intelligence tests continue to serve as a primary component of the assessment process for special education. Practitioners who serve deaf children regularly are faced with the dilemma of choosing from a variety of published tests that often lack sufficient evidence of validity (i.e., that the test score represents what it claims to represent) for this population. There are several potential reasons psychometric evidence is lacking for tests when administered to deaf people. First, deaf people constitute a low-incidence population, and sufficient sample sizes are difficult to obtain to conduct the necessary investigations. Second, the deaf population is composed of a diverse group in terms of a variety of variables, such as communication modalities, degree of hearing loss, parental hearing loss, age of onset, etiology, presence of additional disabilities, race/gender, socioeconomic status (SES), and educational placement. Third, funding is often not available to support investigations by test publishers and independent researchers for low-incidence populations. Finally, many independent researchers may lack the skills both for working with deaf people and in psychometrics that are required to conduct the necessary studies. Thus, valid cognitive assessment remains a difficult dilemma for practitioners whose goals may include helping educators understand a deaf child’s intellectual abilities and educational needs.
Competing models of the factorial structure of the Pictorial Scale of Perceived Competence and Social Acceptance (PSPCSA) were tested for fit using multisample confirmatory factor analysis. The best fitting model was tested for invariance (a) across samples of middle-class (n = 251) and economically disadvantaged (Head Start, n = 117) kindergarten children (whose ages ranged from 67 to 86 months), and (b) over time (at the end of preschool and kindergarten) for the Head Start sample. For kindergarten children, regardless of socioeconomic status, the factor structure of the PSPCSA was consistent with the 2-factor model of Competence and Acceptance. This model also fit reasonably well for Head Start children at the end of their preschool year. However, in addition to providing broad support for the dimensionality of the measure, our findings highlight important concerns about the PSPCSA.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.