Adaptive stress response pathways (SRPs) restore cellular
homeostasis
following perturbation but may activate terminal outcomes like apoptosis,
autophagy, or cellular senescence if disruption exceeds critical thresholds.
Because SRPs hold the key to vital cellular tipping points, they are
targeted for therapeutic interventions and assessed as biomarkers
of toxicity. Hence, we are developing a public database of chemicals
that perturb SRPs to enable new data-driven tools to improve public
health. Here, we report on the automated text-mining pipeline we used
to build and curate the first version of this database. We started
with 100 reference SRP chemicals gathered from published biomarker
studies to bootstrap the database. Second, we used information retrieval
to find co-occurrences of reference chemicals with SRP terms in PubMed
abstracts and determined pairwise mutual information thresholds to
filter biologically relevant relationships. Third, we applied these
thresholds to find 1206 putative SRP perturbagens within thousands
of substances in the Library of Integrated Network-Based Cellular
Signatures (LINCS). To assign SRP activity to LINCS chemicals, domain
experts had to manually review at least three publications for each
of 1206 chemicals out of 181,805 total abstracts. To accomplish this
efficiently, we implemented a machine learning approach to predict
SRP classifications from texts to prioritize abstracts. In 5-fold
cross-validation testing with a corpus derived from the 100 reference
chemicals, artificial neural networks performed the best (F1-macro
= 0.678) and prioritized 2479/181,805 abstracts for expert review,
which resulted in 457 chemicals annotated with SRP activities. An
independent analysis of enriched mechanisms of action and chemical
use class supported the text-mined chemical associations (p < 0.05): heat shock inducers were linked with HSP90
and DNA damage inducers to topoisomerase inhibition. This database
will enable novel applications of LINCS data to evaluate SRP activities
and to further develop tools for biomedical information extraction
from the literature.