Excess phosphorus (P) in wastewater can produce eutrophication, posing a serious risk to the safety of water resources and ecosystems. Therefore, effective pollutant removal including P from wastewater is the key strategy to save the environment and public health. Multi-soil-layering (MSL) is a promising nature-based technology that mainly relies on a soil mixture containing iron to remove P-pollution from wastewater. In the MSL influent, fourteen water quality indicators were measured, including pH, dissolved oxygen, total suspended solids, electrical conductivity, organic matter, nutrients, and coliform bacteria, to determine which ones have the strongest relationship with total phosphorus (TP) removal. The influence of hydraulic loading rate (HLR) and climatic variables (air temperature, rainfall, and evaporation) on the removal of TP was investigated. Four data-driven methods including multiple linear regression (MLR), k-nearest neighbors (KNN), random forest (RF), and neural network (NN) were conducted to predict TP removal at the MSL system outlet. In contrast to climatic variables, the results reveal that the HLR has a significant impact (p < 0.05) on TP removal (47% − 90%) in the MSL system. Furthermore, using a feature selection technique, the HLR, pH, PO43− and TP were suggested as the relevant input variables affecting TP removal in the MSL system, while an examination of accuracy shows that the RF model achieves good prediction accuracy (R2 = 0.93) and can help to understand MSL behavior for pollutants.