With
the increasing global need for groundwater resources to fulfill domestic,
agricultural, and industrial demands, we face the threat of increasing
concentrations of naturally occurring contaminants in water sources
and a consequential need to improve our predictive capacity. Here,
we combine machine learning and geochemical modeling to reveal the
biogeochemical controls on regional groundwater uranium contamination
within the Central Valley, California. We use 23 environmental parameters
from a statewide groundwater geochemical database and publicly available
maps of soil and aquifer physicochemical properties to predict groundwater
uranium concentrations by random forest regression. We find that groundwater
calcium, nitrate, and sulfate concentrations, soil pH, and clay content
(weighted average between 0 and 2 m depths) are the most important
predictors of groundwater uranium concentrations. By pairing multivariate
partial dependence and accumulated local effect plots with modeled
aqueous uranium speciation and surface complexation outputs, we show
that regional groundwater uranium exceedances of drinking water standards,
30 μg L–1, are dependent on the formation
of uranyl–calcium–carbonato species. The geochemical
conditions leading to ternary uranyl complexes within the aquifer
are, in part, created by infiltration through the vadose zone, illustrating
the critical dependence of groundwater quality on recharge conditions.