Aims: This research aims at grouping of cities/regencies on the island of Java, where the central government as well as the most densely populated island in Indonesia, using linear discriminant analysis (LDA) and Naïve Bayes Classifier (NBC).
Study Design: Quantitative design.
Place and Duration of Study: Sample: The data used in this study is secondary data from the Indonesian Central Statistics Agency (Badan Pusat Statistik, BPS) regarding the 2022 Human Development Index (HDI) from 119 cities/regencies on the island of Java. The data used are four HDI indicators as independent variables (long and healthy living, knowledge, and the dimensions of decent living standards) and the HDI value as the dependent variable.
Methodology: The grouping was carried out using LDA and NBC. LDA is a type of multivariate analysis used in the dependency method where the relationship between variables can be distinguished between the independent variable and the dependent variable. It aims at obtaining discriminant function equations to group cases into certain groups and to determine differences between groups based on independent variables. Meanwhile, the NBC method is a simple probability-based prediction technique based on the application of Bayes' theorem (Bayes' rule) with a strong assumption of independence.
Results: Both LDA and NBC can be used for prediction and classification. Based on the results of the discriminant analysis, three discriminant functions were formed to group cities/regencies on the island of Java into three HDI groups. In the NBC analysis, the prior probability value for the very high category HDI group was 0.211, the high category HDI group was 0.606, and the medium category HDI group was 0.183. The research results show that LDA is better than the NBC for grouping cities/regencies based on the 2022 HDI indicators with an accuracy rate of 72.92%. Meanwhile, the NBC analysis only provides an accuracy of 64.58%. Three discriminant functions have been obtained to group cities/regencies on the island of Java based on the largest discriminant score where life expectancy makes the largest contribution in distinguishing each group.
Conclusion: As a result, in this case LDA is a better classification method than the NBC. It is also of important to note medium class regions for further actions from stakeholders.