Malaysia citizens are categorised into three different income groups which are the Top 20 Percent (T20), Middle 40 Percent (M40), and Bottom 40 Percent (B40). One of the focus areas in the Eleventh Malaysia Plan (11MP) is to elevate the B40 household group towards the middle-income society. Based on recent studies by the World Bank, Malaysia is expected to enter the high-income economy status no later than the year 2024. Thus, it is essential to clarify the B40 population through a predictive classification as a prerequisite towards developing a comprehensive action plan by the government. This paper is aimed at identifying the best machine learning models using Naive Bayes, Decision Tree and k-Nearest Neighbors algorithm for classifying the B40 population. Several data pre-processing task such as data cleaning, feature engineering, normalisation, feature selection: Correlation Attribute, Information Gain Attribute and Symmetrical Uncertainty Attribute and sampling methods using SMOTE has been conducted to the raw dataset to ensure the quality of the training data. Each classifier is then optimized using different tuning parameter with 10-Fold Cross Validation for achieving the optimal values before the performance of the three classifiers are compared to each other. For the experiments, a dataset from National Poverty Data Bank called eKasih obtained from the Society Wellbeing Department, Implementation Coordination Unit of Prime Minister's Department (ICU JPM), consisting of 99,546 households from 3 different states: Johor, Terengganu and Pahang are used to train each of the machine learning model. The experimental results using 10-Fold Cross-Validation method demonstrates that the overall performance of Decision Tree model outperformed the other models and the significance test specified the result is statistically significance.
The exponential growth of todays technologies has resulted in the growth of high-throughput data with respect to both dimensionality and sample size. Therefore, efficient and effective supervision of these data becomes increasing challenging and machine learning techniques were developed with regards to knowledge discovery and recognizing patterns from these data. This paper presents machine learning tool for preprocessing tasks and a comparative study of different classification techniques in which a machine learning tasks have been employed in an experimental set up using a data set archived from the UCI Machine Learning Repository website. The objective of this paper is to analyse the impact of refined feature selection on different classification algorithms to improve the prediction of classification accuracy for room occupancy. Subsets of the original features constructed by filter or information gain and wrapper techniques are compared in terms of the classification performance achieved with selected machine learning algorithms. Three feature selection algorithms are tested, specifically the Information Gain Attribute Evaluation (IGAE), Correlation Attribute Evaluation (CAE) and Wrapper Subset Evaluation (WSE) algorithms. Following a refined feature selection stage, three machine learning algorithms are then compared, consisting the Multi-Layer Perceptron (MLP), Logistic Model Trees (LMT) and Instance Based k (IBk). Based on the feature analysis, the WSE was found to be optimal in identifying relevant features. The application of feature selection is certainly intended to obtain a higher accuracy performance. The experimental results also demonstrate the effectiveness of Instance Based k compared to other ML classifiers in providing the highest performance rate of room occupancy prediction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.