Receptor-mediated molecular initiating events (MIEs)
and their
relevance in endocrine activity (EA) have been highlighted in literature.
More than 15 receptors have been associated with neurodevelopmental
adversity and metabolic disruption. MIEs describe chemical interactions
with defined biological outcomes, a relationship that could be described
with quantitative structure–activity relationship (QSAR) models.
QSAR uncertainty can be assessed using the conformal prediction (CP)
framework, which provides similarity (i.e., nonconformity) scores
relative to the defined classes per prediction. CP calibration can
indirectly mitigate data imbalance during model development, and the
nonconformity scores serve as intrinsic measures of chemical applicability
domain assessment during screening. The focus of this work was to
propose an in silico predictive strategy for EA.
First, 23 QSAR models for MIEs associated with EA were developed using
high-throughput data for 14 receptors. To handle the data imbalance,
five protocols were compared, and CP provided the most balanced class
definition. Second, the developed QSAR models were applied to a large
data set (∼55,000 chemicals), comprising chemicals representative
of potential risk for human exposure. Using CP, it was possible to
assess the uncertainty of the screening results and identify model
strengths and out of domain chemicals. Last, two clustering methods,
t-distributed stochastic neighbor embedding and Tanimoto similarity,
were used to identify compounds with potential EA using known endocrine
disruptors as reference. The cluster overlap between methods produced
23 chemicals with suspected or demonstrated EA potential. The presented
models could be utilized for first-tier screening and identification
of compounds with potential biological activity across the studied
MIEs.