In recent years, alternative animal testing methods such
as computational
and machine learning approaches have become increasingly crucial for
toxicity testing. However, the complexity and scarcity of available
biomedical data challenge the development of predictive models. Combining
nonlinear machine learning together with multicondition descriptors
offers a solution for using data from various assays to create a robust
model. This work applies multicondition descriptors (MCDs) to develop
a QSTR (Quantitative Structure–Toxicity Relationship) model
based on a large toxicity data set comprising more than 80,000 compounds
and 59 different end points (122,572 data points). The prediction
capabilities of developed single-task multi-end point machine learning
models as well as a novel data analysis approach with the use of Convolutional
Neural Networks (CNN) are discussed. The results show that using MCDs
significantly improves the model and using them with CNN-1D yields
the best result (R
2
train =
0.93, R
2
ext = 0.70). Several
structural features showed a high level of contribution to the toxicity,
including van der Waals surface area (VSA), number of nitrogen-containing
fragments (nN+), presence of S–P fragments, ionization potential,
and presence of C–N fragments. The developed models can be
very useful tools to predict the toxicity of various compounds under
different conditions, enabling quick toxicity assessment of new compounds.