Biochemical oxygen demand (BOD) is a variable that is missing or inaccurate in many water quality data sets because of difficulties in diluting highly polluted water samples. Machine learning algorithms, particularly support vector regression (SVR), are useful to build regression models to fill gaps in these data sets. The SVR can underpredict extreme-high values when they are few in number and underrepresented. This paper evaluates two methods, bootstrapping and data expansion, to mitigate the problem by increasing the proportion of extreme-high BOD in the data set before training the gap-filling model. Both methods were tested on the water quality data of Yuen Long Creek, Hong Kong, for the years 2000-2014. Both methods were effective in mitigating systematic underprediction and reducing their residual errors when the proportion of extreme-high values in the data set were increased from 3 to 30-40%. Both methods were useful for gap filling on BOD time series because extreme-high values are often the ones missing or inaccurate when highly polluted samples are diluted.