Collecting mining influenced water (MIW) quality data
can result
in incomplete data sets with missing values and anomalies, making
it challenging to use the data for optimizing mine water management.
This work explores advanced statistical data analysis approaches for
addressing missing data interpolation and anomaly detection in MIW
data sets. The study compares the performance of five different interpolation
techniques and four different anomaly detection techniques using supervised
and unsupervised machine learning algorithms developed using Python
3.8.16. The results of the study demonstrate that the radial basis
function, spline, and k-nearest-neighbors interpolation
techniques, along with the predictive confidence interval level anomaly
approach based on gradient boosting regression trees, perform best
for missing data interpolation and anomaly detection, respectively.
Thorough application of these advanced techniques can improve the
accuracy and reliability of mine water quality data, which is crucial
for making conclusions on the safety of the environment, public health,
and effective MIW management. This paper highlights the importance
of developing effective methods for addressing missing data and anomalies
in MIW data sets, which can ultimately lead to improved treatment
plant optimization.