Large datasets with high temporal resolution are becoming widely available through the use of wireless sensors and other low-effort, automated data collection techniques. The higher the sampling frequency is, the more obvious and significant the noise will be due to the highly unrealistic oscillations in the observations. Machine learning techniques work well with large amounts of data, but it is essential to ensure that the data collected is as clean as possible from noise; otherwise, the machine learning algorithm will struggle to predict the actual data and instead attempt to reproduce the noise. This study explores the use of four low pass filters: Butterworth, Chebyshev I, Chebyshev II and Savitzky-Golay filter for removing noise from water quality dataset with high temporal resolution. This study describes how the filters are implemented and gives advice on how to evaluate the filters’ capability to reduce noise and preserve signal features.
The method is applied to five water quality parameters based on a water quality dataset with a 5 minutes resolution collected in an urban surface water body in Bristol, United Kingdom.
Based on the results of this study, it has been found that for the analysed water quality parameters (conductivity, water temperature, dissolved oxygen, and fDOM) Butterworth filter with a cut-off frequency between 2.33E-05 Hz~12 hours and 4.5E-04 Hz~ 6 hours is the filter that allows the best compromise between noise removal and signal preservation