Toxicological information
as needed for risk assessments
of chemical
compounds is often sparse. Unfortunately, gathering new toxicological
information experimentally often involves animal testing. Simulated
alternatives, e.g., quantitative structure–activity relationship
(QSAR) models, are preferred to infer the toxicity of new compounds.
Aquatic toxicity data collections consist of many related taskseach
predicting the toxicity of new compounds on a given species. Since
many of these tasks are inherently low-resource, i.e., involve few
associated compounds, this is challenging. Meta-learning is a subfield
of artificial intelligence that can lead to more accurate models by
enabling the utilization of information across tasks. In our work,
we benchmark various state-of-the-art meta-learning techniques for
building QSAR models, focusing on knowledge sharing between species.
Specifically, we employ and compare transformational machine learning,
model-agnostic meta-learning, fine-tuning, and multi-task models.
Our experiments show that established knowledge-sharing techniques
outperform single-task approaches. We recommend the use of multi-task
random forest models for aquatic toxicity modeling, which matched
or exceeded the performance of other approaches and robustly produced
good results in the low-resource settings we studied. This model functions
on a species level, predicting toxicity for multiple species across
various phyla, with flexible exposure duration and on a large chemical
applicability domain.