Norsyela Muhammad Noor Mathivanan scite author profile

Janor

2019

IJEECS

The curse of dimensionality and the empty space phenomenon emerged as a critical problem in text classification. One way of dealing with this problem is applying a feature selection technique before performing a classification model. This technique helps to reduce the time complexity and sometimes increase the classification accuracy. This study introduces a feature selection technique using K-Means clustering to overcome the weaknesses of traditional feature selection technique such as principal component analysis (PCA) that require a lot of time to transform all the inputs data. This proposed technique decides on features to retain based on the significance value of each feature in a cluster. This study found that k-means clustering helps to increase the efficiency of KNN model for a large data set while KNN model without feature selection technique is suitable for a small data set. A comparison between K-Means clustering and PCA as a feature selection technique shows that proposed technique is better than PCA especially in term of computation time. Hence, k-means clustering is found to be helpful in reducing the data dimensionality with less time complexity compared to PCA without affecting the accuracy of KNN model for a high frequency data.

Improving Classification Accuracy Using Clustering Technique

2018

Product classification is the key issue in e-commerce domains. Many products are released to the market rapidly and to select the correct category in taxonomy for each product has become a challenging task. The application of classification model is useful to precisely classify the products. The study proposed a method to apply clustering prior to classification. This study has used a large-scale real-world data set to identify the efficiency of clustering technique to improve the classification model. The conventional text classification procedures are used in the study such as preprocessing, feature extraction and feature selection before applying the clustering technique. Results show that clustering technique improves the accuracy of the classification model. The best classification model for all three approaches which are classification model only, classification with hierarchical clustering and classification with K-means clustering is K-Nearest Neighbor (KNN) model. Even though the accuracy of the KNN models are the same across different approaches but the KNN model with K-means clustering had the shortest time of execution. Hence, applying K-means clustering prior to KNN model helps in reducing the computation time.

Analysis of K-Means Clustering Algorithm: A Case Study Using Large Scale E-Commerce Products

Janor

2019

Performance Analysis of Supervised Learning Models for Product Title Classification

Janor

2019

IJ-AI

Online business development through e-commerce platforms is a phenomenon which change the world of promoting and selling products in this 21st century. Product title classification is an important task in assisting retailers and sellers to list a product in a suitable category. Product title classification is apart of text classification problem but the properties of product title are different from general document. This study aims to evaluate the performance of five different supervised learning models on data sets consist of e-commerce product titles with a very short description and they are incomplete sentences. The supervised learning models involve in the study are Naïve Bayes, K-Nearest Neighbor (KNN), Decision Tree, Support Vector Machine (SVM) and Random Forest. The results show KNN model is the best model with the highest accuracy and fastest computation time to classify the data used in the study. Hence, KNN model is a good approach in classifying e-commerce products.

Tracing Mathematical Function of Age Specific Fertility Rate in Peninsular Malaysia

2018

IJEECS

The size, structure, and composition of a population are affected by the fertility rates at any point of time. Many researchers took the opportunity to exploit the fertility rates in obtaining better fertility patterns for their country. The curve for the age specific fertility rate is consistent, and this feature allows the curve to be matched with a mathematical model. This paper aimed to identify the best mathematical model that fits the recent age specific fertility rate in Peninsular Malaysia. This study fitted the fertility data of Peninsular Malaysia from 1996 to 2014 to the four mathematical models, which were Hadwiger, Gamma, Beta, and Gompertz models. From the comparisons of the four models, it was found that the best fitted mathematical model is Hadwiger model. In relation to the data of early 21st century, there was an inclination for the best fitted mathematical model from Hadwiger model to Beta model. Hence, the best mathematical model for each year can be used to convert a fertility schedule classified in a five-year age group into a fertility schedule for a single-year of age in Peninsular Malaysia. This model also can be helpful for population projections by using limited and defective data.