Polycyclic aromatic hydrocarbons
(PAHs) are a complex group of
environmental contaminants, many having long environmental half-lives.
As these compounds degrade, the changes in their structure can result
in a substantial increase in mutagenicity compared to the parent compound.
Over time, each individual PAH can potentially degrade into several
thousand unique transformation products, creating a complex, constantly
evolving set of intermediates. Microbial degradation is the primary
mechanism of their transformation and ultimate removal from the environment,
and this process can result in mutagenic activation similar to the
metabolic activation that can occur in multicellular organisms. The
diversity of the potential intermediate structures in PAH-contaminated
environments renders hazard assessment difficult for both remediation
professionals and regulators. A mixture of structural and energetic
descriptors has proven effective in existing studies for classifying
which PAH transformation products will be mutagenic. However, most
existing studies of environmental PAH mutagens primarily focus on
nitrogenated derivatives, which are prevalent in the atmosphere and
not as relevant in soil. Additionally, PAH products commonly found
in the environment can range from as large as five rings to as small
as a single ring, requiring a broadly inclusive methodology to comprehensively
evaluate mutagenic potential. We developed a combination of supervised
and unsupervised machine learning methods to predict environmentally
induced PAH mutagenicity with improved performance over currently
available tools. K-means clustering with principal component analysis
allows us to identify molecular clusters that we hypothesize to have
similar mechanisms of action. Recursive feature elimination identifies
the most influential descriptors. The cluster-specific regression
outperforms available classifiers in predicting direct-acting mutagens
resulting from the microbial biodegradation of PAHs and provides direction
for future studies evaluating the environmental hazards resulting
from PAH biodegradation.