The rapid emergence of new measurement instruments and methods requires personnel and researchers of different disciplines to know the correct statistical methods to utilize to compare their performance with reference ones and properly interpret findings. We discuss the often-made mistake of applying the inappropriate correlation and regression statistical approaches to compare methods and then explain the concepts of agreement and reliability. Then, we introduce the intraclass correlation as a measure of inter-rater reliability, and the Bland–Altman plot as a measure of agreement, and we provide formulae to calculate them along with illustrative examples for different types of study designs, specifically single measurement per subject, repeated measurement while the true value is constant, and repeated measurement when the true value is not constant. We emphasize the requirement to validate the assumptions of these statistical approaches, and also how to deal with violations and provide formulae on how to calculate the confidence interval for estimated values of agreement and intraclass correlation. Finally, we explain how to interpret and report the findings of these statistical analyses.
This study employs eye-tracking to investigate how first (L1) and second language (L2) glosses affect lexical uptake and reading behaviors in L2 learners of English. The study also explores the relationship between lexical uptake and reading behaviors as a function of gloss type. To investigate this, 81 Korean university students were asked to read a baseline passage with no gloss or the same passage with glosses in the study’s L1 (Korean) or L2 (English). Their eye movements were recorded with an eye tracker as they read, and they were subsequently asked to respond to two vocabulary tests. Analyses of eye-tracking data and vocabulary test scores revealed that the presence or absence of L1 and L2 glosses might produce differences in lexical uptake and dissimilar attentional mechanisms. For instance, the study found that L1 and L2 glosses failed to significantly enhance the acquisition of visual word forms, whereas both types of glosses were significantly effective in consolidating form–meaning associations. Additionally, correlation analyses indicated that the relationship between reading behaviors and lexical acquisition might differ depending on gloss type. Ultimately, our findings provide a more comprehensive picture of L1 and L2 gloss effects, and have significant implications for L2 pedagogy.
Social determinants of health, such as poverty and minority background, severely disadvantage many people with mental disorders. A variety of innovative federal, state, and local programs have combined social services with mental health interventions. To explore the potential effects of such supports for addressing poverty and disadvantage on mental health outcomes, we simulated improvements in three social determinants—education, employment, and income. We used two large data sets: one from the National Institute of Mental Health that contained information about people with common mental disorders such as anxiety and depression, and another from the Social Security Administration that contained information about people who were disabled due to severe mental disorders such as schizophrenia and bipolar disorder. Our simulations showed that increasing employment was significantly correlated with improvements in mental health outcomes, while increasing education and income produced weak or nonsignificant correlations. In general, minority groups as well as the majority group of non-Latino whites improved in the desired outcomes. We recommend that health policy leaders, state and federal agencies, and insurers provide evidence-based employment services as a standard treatment for people with mental disorders.
The Cox proportional hazards model with a latent trait variable (Ranger & Ortner, 2012, Br. J. Math. Stat. Psychol., 65, 334) has shown promise in accounting for the dependency of response times from the same examinee. The model allows flexibility in shapes of response time distributions using the non‐parametric baseline hazard rate while allowing parametric inference about the latent variable via exponential regression. The flexibility of the model, however, comes at the price of a significant increase in the complexity of estimating the model. The purpose of this study is to propose a new estimation approach to overcome this difficulty in model estimation. The new procedure is based on the penalized partial likelihood estimator in which the partial likelihood is maximized in the presence of a penalty function. The potential of the proposed method is corroborated by a series of simulation studies for fitting the proportional hazards latent trait model to psychological and educational testing data. The application of the estimation method to the hierarchical framework (van der Linden, 2007, Psychometrika, 72, 287) is also illustrated for jointly analysing response times and accuracy scores.
Prior research has established that students often underprepare for midterm examinations yet remain overconfident in their proficiency. Research concerning the testing effect has demonstrated that utilizing testing as a study strategy leads to higher performance and more accurate confidence compared to more common study strategies such as rereading or reviewing homework problems. We report on three experiments that explore the viability of using computer adaptive testing (CAT) for assessing students' physics proficiency, for preparing students for midterm exams by diagnosing their weaknesses, and for predicting scores in midterm exams in an introductory calculus-based mechanics course for science and engineering majors. The first two experiments evaluated the reliability and validity of the CAT algorithm. In addition, we investigated the ability of the CAT test to predict performance on the midterm exam. The third experiment explored whether completing two CAT tests in the days before a midterm exam would facilitate performance on the midterm exam. Scores on the CAT tests and the midterm exams were significantly correlated and, on average, were not statistically different from each other. This provides evidence for moderate parallel-forms reliability and criterion-related validity of the CAT algorithm. In addition, when used as a diagnostic tool, CAT showed promise in helping students perform better on midterm exams. Finally, we found that the CAT tests predicted the average performance on the midterm exams reasonably well, however, the CAT tests were not as accurate as desired at predicting the performance of individual students. While CAT shows promise for practice testing, more research is needed to refine testing algorithms to increase reliability before implementing CAT for summative evaluations. In light of these findings, we believe that more research is needed comparing CAT to traditional paper-and-pencil practice tests in order to determine whether the effort needed to create a CAT system is worthwhile.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.