This study aims to identify and evaluate a robust and replicable public health predictive model that can be applied to the COVID-19 time-series dataset, and to compare the model performance after performing the 7-day, 14-day, and 28-day forecast interval. The seasonal autoregressive integrated moving average (SARIMA) model was developed and validated using a Thailand COVID-19 open dataset from 1 December 2021 to 30 April 2022, during the Omicron variant outbreak. The SARIMA model with a non-statistically significant p-value of the Ljung–Box test, the lowest AIC, and the lowest RMSE was selected from the top five candidates for model validation. The selected models were validated using the 7-day, 14-day, and 28-day forward-chaining cross validation method. The model performance matrix for each forecast interval was evaluated and compared. The case fatality rate and mortality rate of the COVID-19 Omicron variant were estimated from the best performance model. The study points out the importance of different time interval forecasting that affects the model performance.
Template matching is a technique in digital image processing for finding a target object which matches the given template's pattern. This paper mainly aims to apply the template matching technique to check the answer sheet with multiple choices in order to reduce the cost of using an expensive machine called the Optical Mark Reader (OMR) machine as usual. Even more, the machine works with only a high hardness (or dark black) pencils such as ones with the hardness of 2B or higher. The technique of template matching has been applied to detect two diagonal lines of a cross mark (or an X-mark), instead of searching for an X mark which represents a cross mark. For this purpose, two templates representing the two diagonal lines must be tested, with different orientation angles as needed. However, the approach of adaptive matching proposed in this paper also allows a flexible size of the blank slot (a rectangular or square shape in which an X mark may be drawn). The size of a blank slot may be varied between 0.8 to 1.5 centimeters. Even more, the proposed approach also allows some acceptable deviations in terms of the mark shifts from the correct and precise position due to the natural line drawing of a human. Not only does the approach accept a black ink pen or pencil, but also any ink color of pen which is obviously seen comparing with the color of the paper sheet. Thus, these ways make the proposed method more flexible and practical in real use. The total of 6400 slots of the answer sheets are tested with three steps of shifting and rotating. Each step includes 200 marks each of which is acceptable to be a correct X-mark. From the experimental results, the proposed approach provides the accuracy of 100 percent after combining the use of shifted and/or oriented templates while the accuracy of only 42.8 percent is obtained without the combination. However, if a mark is considered as an unacceptable X-mark, the approach can also detect it completely.
Class-imbalance problem is the problem that the number, or data, in the majority class is much more than in the minority class. Traditional classifiers cannot sort out this problem because they focus on the data in the majority class than on the data in the minority class, and then they predict some upcoming data as the data in the majority class. Under-sampling is an efficient way to handle this problem because this method selects the representatives of the data in the majority class. For this reason, under-sampling occupies shorter training period than over-sampling. The only problem with the under-sampling method is that a representative selection, in all probability, throws away important information in a majority class. To overcome this problem, we propose a cluster-based undersampling method. We use a clustering algorithm that is performance guaranteed, named k-centers algorithm, which clusters the data in the majority class and selects a number of representative data in many proportions, and then combines them with all the data in the minority class as a training set. In this paper, we compare our approach with k-means on five datasets from UCI with two classifiers: 5-nearest neighbors and c4.5 decision tree. The performance is measured by Precision, Recall, F-measure, and Accuracy. The experimental results show that our approach has higher measurements than the k-means approach, except Precision where both the approaches have the same rate.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.