Students’ writing can provide better insight into their thinking than can multiple-choice questions. However, resource constraints often prevent faculty from using writing assessments in large undergraduate science courses. We investigated the use of computer software to analyze student writing and to uncover student ideas about chemistry in an introductory biology course. Students were asked to predict acid–base behavior of biological functional groups and to explain their answers. Student explanations were rated by two independent raters. Responses were also analyzed using SPSS Text Analysis for Surveys and a custom library of science-related terms and lexical categories relevant to the assessment item. These analyses revealed conceptual connections made by students, student difficulties explaining these topics, and the heterogeneity of student ideas. We validated the lexical analysis by correlating student interviews with the lexical analysis. We used discriminant analysis to create classification functions that identified seven key lexical categories that predict expert scoring (interrater reliability with experts = 0.899). This study suggests that computerized lexical analysis may be useful for automatically categorizing large numbers of student open-ended responses. Lexical analysis provides instructors unique insights into student thinking and a whole-class perspective that are difficult to obtain from multiple-choice questions or reading individual responses.
We present a diagnostic question cluster (DQC) that assesses undergraduates' thinking about photosynthesis. This assessment tool is not designed to identify individual misconceptions. Rather, it is focused on students' abilities to apply basic concepts about photosynthesis by reasoning with a coordinated set of practices based on a few scientific principles: conservation of matter, conservation of energy, and the hierarchical nature of biological systems. Data on students' responses to the cluster items and uses of some of the questions in multiple-choice, multiple-true/false, and essay formats are compared. A cross-over study indicates that the multiple-true/false format shows promise as a machine-gradable format that identifies students who have a mixture of accurate and inaccurate ideas. In addition, interviews with students about their choices on three multiple-choice questions reveal the fragility of students' understanding. Collectively, the data show that many undergraduates lack both a basic understanding of the role of photosynthesis in plant metabolism and the ability to reason with scientific principles when learning new content. Implications for instruction are discussed.
Our study explored the prospects and limitations of using machine-learning software to score introductory biology students’ written explanations of evolutionary change. We investigated three research questions: 1) Do scoring models built using student responses at one university function effectively at another university? 2) How many human-scored student responses are needed to build scoring models suitable for cross-institutional application? 3) What factors limit computer-scoring efficacy, and how can these factors be mitigated? To answer these questions, two biology experts scored a corpus of 2556 short-answer explanations (from biology majors and nonmajors) at two universities for the presence or absence of five key concepts of evolution. Human- and computer-generated scores were compared using kappa agreement statistics. We found that machine-learning software was capable in most cases of accurately evaluating the degree of scientific sophistication in undergraduate majors’ and nonmajors’ written explanations of evolutionary change. In cases in which the software did not perform at the benchmark of “near-perfect” agreement (kappa > 0.80), we located the causes of poor performance and identified a series of strategies for their mitigation. Machine-learning software holds promise as an assessment tool for use in undergraduate biology education, but like most assessment tools, it is also characterized by limitations.
This study develops a framework to conceptualize the use and evolution of machine learning (ML) in science assessment. We systematically reviewed 47 studies that applied ML in science assessment and classified them into five categories: (a) constructed response, (b) essay, (c) simulation, (d) educational game, and (e) inter‐discipline. We compared the ML‐based and conventional science assessments and extracted 12 critical characteristics to map three variables in a three‐dimensional framework: construct, functionality, and automaticity. The 12 characteristics used to construct a profile for ML‐based science assessments for each article were further analyzed by a two‐step cluster analysis. The clusters identified for each variable were summarized into four levels to illustrate the evolution of each. We further conducted cluster analysis to identify four classes of assessment across the three variables. Based on the analysis, we conclude that ML has transformed—but not yet redefined—conventional science assessment practice in terms of fundamental purpose, the nature of the science assessment, and the relevant assessment challenges. Along with the three‐dimensional framework, we propose five anticipated trends for incorporating ML in science assessment practice for future studies: addressing developmental cognition, changing the process of educational decision making, personalized science learning, borrowing 'good' to advance 'good', and integrating knowledge from other disciplines into science assessment.
Recent calls for college biology education reform have identified “pathways and transformations of matter and energy” as a big idea in biology crucial for students to learn. Previous work has been conducted on how college students think about such matter-transforming processes; however, little research has investigated how students connect these ideas. Here, we probe student thinking about matter transformations in the familiar context of human weight loss. Our analysis of 1192 student constructed responses revealed three scientific (which we label “Normative”) and five less scientific (which we label “Developing”) ideas that students use to explain weight loss. Additionally, students combine these ideas in their responses, with an average number of 2.19 ± 1.07 ideas per response, and 74.4% of responses containing two or more ideas. These results highlight the extent to which students hold multiple (both correct and incorrect) ideas about complex biological processes. We described student responses as conforming to either Scientific, Mixed, or Developing descriptive models, which had an average of 1.9 ± 0.6, 3.1 ± 0.9, and 1.7 ± 0.8 ideas per response, respectively. Such heterogeneous student thinking is characteristic of difficulties in both conceptual change and early expertise development and will require careful instructional intervention for lasting learning gains.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.