Arsenic from geologic
sources is widespread in groundwater within
the United States (U.S.). In several areas, groundwater arsenic concentrations
exceed the U.S. Environmental Protection Agency maximum contaminant
level of 10 μg per liter (μg/L). However, this standard
applies only to public-supply drinking water and not to private-supply,
which is not federally regulated and is rarely monitored. As a result,
arsenic exposure from private wells is a potentially substantial,
but largely hidden, public health concern. Machine learning models
using boosted regression trees (BRT) and random forest classification
(RFC) techniques were developed to estimate probabilities and concentration
ranges of arsenic in private wells throughout the conterminous U.S.
Three BRT models were fit separately to estimate the probability of
private well arsenic concentrations exceeding 1, 5, or 10 μg/L
whereas the RFC model estimates the most probable category (≤5,
>5 to ≤10, or >10 μg/L). Overall, the models perform
best at identifying areas with low concentrations of arsenic in private
wells. The BRT 10 μg/L model estimates for testing data have
an overall accuracy of 91.2%, sensitivity of 33.9%, and specificity
of 98.2%. Influential variables identified across all models included
average annual precipitation and soil geochemistry. Models were developed
in collaboration with public health experts to support U.S.-based
studies focused on health effects from arsenic exposure.