Detention ponds are effective structures for stormwater management in the urban drainage system of sponge cities. The pond size is taken as the decision variable, while the cost, total suspended solids (TSS), and catchment peak outflow (CPO) serve as the objectives for optimizing the detention pond volume. First, we randomly generated 10,000 pond areas and input them into the stormwater management model to simulate the time series of outflow and suspended solids concentration, thereby generating samples by combining the set of pond area, corresponding cost, TSS, and CPO. Then, two backpropagation neural network models (i.e., BPNN-TSS and BPNN-CPO) were trained, tested, and evaluated for predicting TSS and CPO, respectively. We employed them as surrogates and used the non-dominated sorting genetic algorithm-II to solve the optimization problem. The results showed: (1) The BPNN models accurately predicted TSS and CPO (determination coefficient 0.988~0.996, Nash–Sutcliffe efficiency 0.988~0.997), and efficiently substituted stormwater management model simulations for optimization purposes (residuals −18.49~28.10 kg and −0.45~0.29 m3/s). (2) For the Pareto solutions, the detention pond reduced TSS by 0~8.33% and CPO by 0~72.44% and delayed their peaks by 4~52 min; the reduction in TSS and CPO tends to grow as pond size increases, and CPO reduction exhibits a minor marginal effect. (3) The surrogate-based approach saves 90.03% runtime while preserving the quality of the Pareto solutions, verifying reliability.