Applicability of statistical models in predicting chlorine decay remains minimally explored. This study predicted residual chlorine using six deep learning and nine machine learning techniques. Suitability of multimodel ensembles (MMEs) including arithmetic mean of all the models (Ens1), average of the best three performing models (Ens2), and weighted mean of outputs from all the 15 models was investigated. A total of nine “goodness-of-fit” measures (such as distance correlation (rd) and Taylor skill score) were used to rank the models. The two best deep learning methods were the nonlinear autoregressive model with exogenous input (NARX) (
r
d
=
0.51
) and feedforward backpropagation (FFB) (
r
d
=
0.61
). The two best machine learning algorithms included random forests (RF) (
r
d
=
0.64
) and Gaussian process regression (GPR) (
r
d
=
0.59
). Eventually, Ens2 was obtained using RF, FFB, and GPR. Ens2 performed better than Ens1 and Ens3. The amount of variance explained by individual models and MMEs was over the ranges of 13–66% and 51–74%, respectively. Ens2 explained 74% of the total variance in observed residual chlorine. Remarkably, the appropriateness of the MMEs depends on the approach for combining model outputs, and the number of models considered. This study demonstrated the acceptability of statistical MMEs in predicting chlorine residual concentration in drinking water.