The target (dependent) variable is often influenced not only by ratio scale variables, but also by qualitative (nominal scale) variables in classification analysis. Majority of machine learning techniques accept only numerical inputs. Hence, it is necessary to encode these categorical variables into numerical values using encoding techniques. If the variable does not have relation or order between its values, assigning numbers will mislead the machine learning techniques. This paper presents a modified k-nearest-neighbors algorithm that calculates the distances values of categorical (nominal) variables without encoding them. A student’s academic performance dataset is used for testing the enhanced algorithm. It shows that the proposed algorithm outperforms standard one that needs nominal variables encoding to calculate the distance between the nominal variables. The results show the proposed algorithm preforms 14% better than standard one in accuracy, and it is not sensitive to outliers.
Distance learning has made learning possible for those who cannot attend traditional courses, especially in pandemic periods. This type of learning, however, faces a challenge in keeping students engaged and interested. Furthermore, it is important to identify students who are in need of help to ensure that their progress does not deteriorate. First, the research identifies students’ engagement based on their behaviors in Virtual Learning Environment (VLE) and their performances in assessments. This research goal is to investigate the association/relationship between demographic characteristics and engagement level. It identifies less engaged students by using an unsupervised clustering model based on VLE interactions and assessments of submission-derived features. According to results, the two-level clustering model outperforms other models in regard to cluster separation using silhouette coefficient. Apriori algorithm is utilized to obtain a set of rules that connect demographic features to student engagement. Results show gender, highest education, studied credits, and number of previous attempts have positive correlation with engagement level in distance-based learning.
Abstract— Although Online learning has been so popular especially during epidemic crisis, it has a drawback of high dropouts and low completion rates. Institutes search for ways to support their students learning and increase completion rates. Institutes will be able to predict students’ performances and make interventions on time if they have some analytical strategy. Yet, efficient prediction and proactive intervention depends on using meaningful, reliable, and accurate data. Institutes different tools like Virtual Learning Environment (VLE) for teaching and content delivery. These tools provide large databases that are useful to improve prediction of students’ performance research. In this study, an Open University course VLE data is analyzed to investigate if weekly engagement alone, integrated with assessments scores (first approach), and accumulated previous assessments up to a certain week data ((second approach) lead to accurate student performance prediction. Importance of VLE data is highlighted here, which sheds light on students’ haviour and leads to developing models that can predict student’s outcome accurately. Second approach generated robust prediction models which outperformed the results obtained using first approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.