Educational data mining is capable of producing useful data-driven applications (e.g., early warning systems in schools or the prediction of students’ academic achievement) based on predictive models. However, the class imbalance problem in educational datasets could hamper the accuracy of predictive models as many of these models are designed on the assumption that the predicted class is balanced. Although previous studies proposed several methods to deal with the imbalanced class problem, most of them focused on the technical details of how to improve each technique, while only a few focused on the application aspect, especially for the application of data with different imbalance ratios. In this study, we compared several sampling techniques to handle the different ratios of the class imbalance problem (i.e., moderately or extremely imbalanced classifications) using the High School Longitudinal Study of 2009 dataset. For our comparison, we used random oversampling (ROS), random undersampling (RUS), and the combination of the synthetic minority oversampling technique for nominal and continuous (SMOTE-NC) and RUS as a hybrid resampling technique. We used the Random Forest as our classification algorithm to evaluate the results of each sampling technique. Our results show that random oversampling for moderately imbalanced data and hybrid resampling for extremely imbalanced data seem to work best. The implications for educational data mining applications and suggestions for future research are discussed.
As universities around the world have begun to use learning management systems (LMSs), more learning data have become available to gain deeper insights into students' learning processes and make data-driven decisions to improve student learning. With the availability of rich data extracted from the LMS, researchers have turned much of their attention to learning analytics (LA) applications using educational data mining techniques. Numerous LA models have been proposed to predict student achievement in university courses. To design predictive LA models, researchers often follow a data-driven approach that prioritizes prediction accuracy while sacrificing theoretical links to learning theory and its pedagogical implications. In this study, we argue that instead of complex variables (e.g., event logs, clickstream data, timestamps of learning activities), data extracted from online formative assessments should be the starting point for building predictive LA models. Using the LMS data from multiple offerings of an asynchronous undergraduate course, we analysed the utility of online formative assessments in predicting students' final course performance. Our findings showed that the features extracted from online formative assessments (e.g., completion, timestamps and scores) served as strong and significant predictors of students' final course performance. Scores from online formative assessments were consistently the strongest predictor of student performance across the three sections
For students, feedback received from their instructors can make a big difference in their learning by translating their assessment performance into future learning opportunities. To date, researchers have proposed various feedback literacy frameworks, which concern one’s ability to interpret and use feedback for their learning, to promote students’ feedback engagement by repositioning them as active participants in the learning process. However, the current feedback literacy frameworks have not been adapted to digital or e-Assessment settings despite the increasing use of e-Assessments (e.g., computer-based tests, intelligent tutoring systems) in practice. To address this gap, this conceptual paper introduces a feedback literacy model in the context of e-Assessments to present an intersection between e-Assessment features and the ecological model of feedback literacy for more effective feedback practices in digital learning environments. This paper could serve as a guideline to improve feedback effectiveness and its perceived value in e-Assessment to enhance student feedback literacy.
Feedback is an essential part of the educational assessment that improves student learning. As education changes with the advancement of technology, educational assessment has also adapted to the advent of Artificial Intelligence (AI). Despite the increasing use of online assessments during the last decade, a limited number of studies have discussed the feedback generation process as implemented through AI. To address this gap, we propose a conceptual paper to organize and discuss the application of AI in the feedback generation and delivery processes. Among different branches of AI, Natural Language Processing (NLP), Educational Data Mining (EDM), and Learning Analytics (LA) play the most critical roles in the feedback generation process. The process begins with analyzing students’ data from educational assessments to build a predictive machine learning model with additional features such as students’ interaction with course material using EDM methods to predict students’ learning outcomes. Written feedback can be generated from a model with NLP-based algorithms before being delivered, along with non-verbal feedback via a LA dashboard or a digital score report. Also, ethical recommendations for using AI for feedback generation are discussed. This paper contributes to understanding the feedback generation process to serve as a venue for the future development of digital feedback.
Rapid guessing is an aberrant response behavior that commonly occurs in low-stakes assessments with little to no formal consequences for students. Recently, the availability of response time (RT) information in computer-based assessments has motivated researchers to develop various methods to detect rapidly guessed responses systematically. These methods often require researchers to identify an RT threshold subjectively for each item that could distinguish rapid guessing behavior from solution behavior. In this study, we propose a data-driven approach based on random search and genetic algorithm to search for the optimal RT threshold within a predefined search space. We used response data from a low-stakes math assessment administered to over 5000 students in 658 schools across the United States. As we demonstrated how to use our data-driven approach, we also compared its performance with those of the existing threshold-setting methods. The results show that the proposed method could produce viable RT thresholds for detecting rapid guessing in low-stakes assessments. Moreover, compared with the other threshold-setting methods, the proposed method yielded more liberal RT thresholds, flagging a larger number of responses. Implications for practice and directions for future research were discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.