Machine
learning (ML) models for screening endocrine-disrupting
chemicals (EDCs), such as thyroid stimulating hormone receptor (TSHR)
agonists, are essential for sound management of chemicals. Previous
models for screening TSHR agonists were built on imbalanced datasets
and lacked applicability domain (AD) characterization essential for
regulatory application. Herein, an updated TSHR agonist dataset was
built, for which the ratio of active to inactive compounds greatly
increased to 1:2.6, and chemical spaces of structure–activity
landscapes (SALs) were enhanced. Resulting models based on 7 molecular
representations and 4 ML algorithms were proven to outperform previous
ones. Weighted similarity density (ρs) and weighted
inconsistency of activities (I
A) were
proposed to characterize the SALs, and a state-of-the-art AD characterization
methodology ADSAL{ρs, I
A} was established. An optimal classifier developed with
PubChem fingerprints and the random forest algorithm, coupled with
ADSAL{ρs ≥ 0.15, I
A ≤ 0.65}, exhibited good performance on the validation
set with the area under the receiver operating characteristic curve
being 0.984 and balanced accuracy being 0.941 and identified 90 TSHR
agonist classes that could not be found previously. The classifier
together with the ADSAL{ρs, I
A} may serve as efficient tools for screening EDCs, and
the AD characterization methodology may be applied to other ML models.