In this study, we analyze the systematic error from false positives of the Force Concept Inventory (FCI). We compare the systematic errors of question 6 (Q.6), Q.7, and Q.16, for which clearly erroneous reasoning has been found, with Q.5, for which clearly erroneous reasoning has not been found. We determine whether or not a correct response to a given FCI question is a false positive using subquestions. In addition to the 30 original questions, subquestions were introduced for Q.5, Q.6, Q.7, and Q.16. This modified version of the FCI was administered to 1145 university students in Japan from 2015 to 2017. In this paper, we discuss our finding that the systematic errors of Q.6, Q.7, and Q.16 are much larger than that of Q.5 for students with mid-level FCI scores. Furthermore, we find that, averaged over the data sample, the sum of the false positives from Q.5, Q.6, Q.7, and Q.16 is about 10% of the FCI score of a midlevel student.
We are interested in quantifying the systematic error of the Force Concept Inventory (FCI). A modified version of the FCI was administered to 500 university students in Japan in 2015. In addition to the 30 original questions, subquestions were introduced for three questions that, according to prior research, elicit false positives from students (6, 7, and 16) as well as for question 5. Using logistic regression with the results of question 5 and its subquestions, we estimate the systematic error arising from the remaining 26 questions. Our results indicate that FCI true score can be less than half of the FCI raw score for Japanese students.
We analyze the measurement error due to false positives of the Force Concept Inventory (FCI) focusing on four questions (Q.5, Q.6, Q.7, and Q.16). We determine whether or not a correct response to a given FCI question is a false positive using subquestions. Using the data of 1145 university students in Japan from 2015 to 2017, we find that the sum of the systematic error from the false positives of Q.5, Q.6, Q.7, and Q.16 is about 10% of the FCI score of a mid-level student. We consider what degree the error influences the measures of the effectiveness of a course, namely, the average normalized gain and Cohens’ d. Using a set of simulated data, we show that Cohens’ d is less sensitive to the systematic error due to false positives than the average normalized gain.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.