Imbalanced class is one of the trials in classifying materials of big data. Data disparity produces a biased output of a model regardless how recent the technology is. However, deep learning algorithms such as convolutional neural networks and deep belief networks have proven to provide promising results in many research domains, especially in image processing as well as time series forecasting, intrusion detection, and classification. Therefore, this paper will investigate the effect of imbalanced data discrepancy of classes in MNIST handwritten dataset using convolutional neural networks and deep belief networks. Based on the experiment conducted, the results show that although the algorithm is suitable for multiple domains and have shown stability, the imbalanced distribution of data still able to affect the overall performance of the models.
Imbalanced class data is a common issue faced in classification tasks. Deep Belief Networks (DBN) is a promising deep learning algorithm when learning from complex feature input. However, when handling imbalanced class data, DBN encounters low performance as other machine learning algorithms. In this paper, the genetic algorithm (GA) and bootstrap sampling are incorporated into DBN to lessen the drawbacks occurs when imbalanced class datasets are used. The performance of the proposed algorithm is compared with DBN and is evaluated using performance metrics. The results showed that there is an improvement in performance when Evolutionary DBN with bootstrap sampling is used to handle imbalanced class datasets.
Imbalanced data is one of the challenges in a classification task in machine learning. Data disparity produces a biased output of a model regardless how recent the technology is. However, deep learning algorithms, such as deep belief networks showed promising results in many domains, especially in image processing. Therefore, in this paper, we will review the effect of imbalanced data disparity in classes using deep belief networks as the benchmark model and compare it with conventional machine learning algorithms, such as backpropagation neural networks, decision trees, naïve Bayes and support vector machine with MNIST handwritten dataset. The experiment shows that although the algorithm is stable and suitable for multiple domains, the imbalanced data distribution still manages to affect the outcome of the conventional machine learning algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.