The term PFAS encompasses diverse per- and polyfluorinated
alkyl
(and increasingly aromatic) chemicals spanning industrial processes,
commercial uses, environmental occurrence, and potential concerns.
With increased chemical curation, currently exceeding 14,000 structures
in the PFASSTRUCTV5 inventory on EPA’s CompTox Chemicals Dashboard,
has come increased motivation to profile, categorize, and analyze
the PFAS structure space using modern cheminformatics approaches.
Making use of the publicly available ToxPrint chemotypes and ChemoTyper
application, we have developed a new PFAS-specific fingerprint set
consisting of 129 TxP_PFAS chemotypes coded in CSRML, a chemical-based
XML-query language. These are split into two groups, the first containing
56 mostly bond-type ToxPrints modified to incorporate attachment to
either a CF group or F atom to enforce proximity to the fluorinated
portion of the chemical. This focus resulted in a dramatic reduction
in TxP_PFAS chemotype counts relative to the corresponding ToxPrint
counts (averaging 54%). The remaining TxP_PFAS chemotypes consist
of various lengths and types of fluorinated chains, rings, and bonding
patterns covering indications of branching, alternate halogenation,
and fluorotelomers. Both groups of chemotypes are well represented
across the PFASSTRUCT inventory. Using the ChemoTyper application,
we show how the TxP_PFAS chemotypes can be visualized, filtered, and
used to profile the PFASSTRUCT inventory, as well as to construct
chemically intuitive, structure-based PFAS categories. Lastly, we
used a selection of expert-based PFAS categories from the OECD Global
PFAS list to evaluate a small set of analogous structure-based TxP_PFAS
categories. TxP_PFAS chemotypes were able to recapitulate the expert-based
PFAS category concepts based on clearly defined structure rules that
can be computationally implemented and reproducibly applied to process
large PFAS inventories without need to consult an expert. The TxP_PFAS
chemotypes have the potential to support computational modeling, harmonize
PFAS structure-based categories, facilitate communication, and allow
for more efficient and chemically informed exploration of PFAS chemicals
moving forward.