Gathering public opinions on the Internet and Internet-based applications like Twitter has become popular in recent times, as it provides decision-makers with uncensored public views on products, government policies, and programs. Through natural language processing and machine learning techniques, unstructured data forms from these sources can be analyzed using traditional statistical learning. The challenge encountered in machine learning method-based sentiment classification still remains the abundant amount of data available, which makes it difficult to train the learning algorithms in feasible time. This eventually degrades the classification accuracy of the algorithms. From this assertion, the effect of training data sizes in classification tasks cannot be overemphasized. This study statistically assessed the performance of Naive Bayes, support vector machine (SVM), and random forest algorithms on sentiment text classification task. The research also investigated the optimal conditions such as varying data sizes, trees, and kernel types under which each of the respective algorithms performed best. The study collected Twitter data from Ghanaian users which contained sentiments about the Ghanaian Government. The data was preprocessed, manually labeled by the researcher, and then trained using the aforementioned algorithms. These algorithms are three of the most popular learning algorithms which have had lots of success in diverse fields. The Naive Bayes classifier was adjudged the best algorithm for the task as it outperformed the other two machine learning algorithms with an accuracy of 99%, F1 score of 86.51%, and Matthews correlation coefficient of 0.9906. The algorithm also performed well with increasing data sizes. The Naive Bayes classifier is recommended as viable for sentiment text classification, especially for text classification systems which work with Big Data.
Governments across the world rely on their Customs Administration to provide functions that include border security, intellectual property rights protection, environmental protection, and revenue mobilisation amongst others. Analyzing the trends in revenue being collected from Customs is necessary to direct government policies and decisions. Models that can capture the trends being purported from the nominal (nonreal) tax values with respect to the trade volumes (value) over the period are indispensable. Predominant amongst the existing models are the econometric models (the GDP-based model, the monthly receipts model, and the microsimulation model), which are laborious and sometimes unreliable when studying trends in time series data. In this study, we modelled monthly revenue data obtained from the Ghana Revenue Authority-Customs Division (GRA-CD) for the period January 2010 to December 2019 using two traditional time series models, ARIMA model and ARIMA Error Regression Model (ARIMAX), and two machine learning time series models, Bayesian Structural Time Series (BSTS) model and a Neural Network Autoregression model. The Neural Network Autoregression model of the form NNAR (1, 3) provided the best forecasts with the least Mean Squared Error (MSE) of 53.87 and relatively lower Mean Absolute Percentage Error (MAPE) of 0.08. Generally, the machine learning models (NNAR (1, 3) and BSTS) outperformed the traditional time series models (ARIMA and ARIMAX models). The forecast values from the NNAR (1, 3) indicated a potential decline in revenue and this emphasizes the need for relevant authorities to institute measures to improve revenue generation in the immediate future.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.