To obtain high performance, generalization, and accuracy in machine learning applications, such as prediction or anomaly detection, large datasets are a necessary prerequisite. Moreover, the collection of data is time-consuming, difficult, and expensive for many imbalanced or small datasets. These challenges are evident in collecting data for financial and banking services, pharmaceuticals and healthcare, manufacturing and the automobile, robotics car, sensor time-series data, and many more. To overcome the challenges of data collection, researchers in many domains are becoming more and more interested in the development or generation of synthetic data. Generating synthetic time-series data is far more complicated and expensive than generating synthetic tabular data. The primary objective of the paper is to generate multivariate time-series data (for continuous and mixed parameters) that are comparable and evaluated with real multivariate time-series synthetic data. After being trained to produce such data, a novel GAN architecture named as MTS-TGAN is proposed and then assessed using both qualitative measures namely t-SNE, PCA, discriminative and predictive scores as well as quantitative measures, for which an RNN model is implemented, which calculates MAE and MSLE scores for three training phases; Train Real Test Real, Train Real Test Synthetic and Train Synthetic Test Real. The model is able to reduce the overall error up to 13% and 10% in predictive and discriminative scores, respectively. The research’s objectives are met, and the outcomes demonstrate that MTS-TGAN is able to pick up on the distribution and underlying knowledge included in the attributes of the real data and it can serve as a starting point for additional research in the respective area.
A region’s population growth inevitably results in higher water consumption. This persistent rise in water use increases the region’s wastewater production. Consequently, due to this increase in wastewater (influent), Wastewater Treatment Plants (WWTPs) are required to run effectively in order to handle the huge demand for treated/processed water (effluent). Knowing in advance the influent and effluent parameters increases the operational efficiency and enables cost-effective utilization of diverse resources at wastewater treatment plants. This paper is based on a prediction/forecasting of an influent quality parameter, namely total MLD, as well as effluent quality parameters, namely MPN, BOD, DO, COD and pH for the real-time data collected pre-, during and post-COVID-19 at the Bharwara WWTP in Lucknow, India. It is the largest UASB-based wastewater treatment facility in Uttar Pradesh and the second largest in Asia. In this paper, we propose a novel model namely, wPred comprising extensions of SARIMA with seasonal order and ANN-based ML models to estimate the influent and effluent quality parameters, respectively, and compare it with the existing machine learning models. The lowest sMAPE error for the influent parameters using wPred is 2.59%. The findings of the paper show a strong correlation (R-value), up to 0.99, between the effluent parameters actually measured and predicted. As a result, the model designed in this paper has an acceptable level of accuracy and generalizability which efficiently predicts/ forecasts the performance of Bharwara WWTP.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.