Objective: Currently, dedicated tagging staff spend considerable effort assigning clinical codes to patient summaries for public health purposes, and machine-learning automated tagging is bottlenecked by availability of electronic medical records. Veterinary medical records, a largely untapped data source that could benefit both human and non-human patients, could fill the gap.
Materials and Methods:In this retrospective study, we trained long short-term memory (LSTM) recurrent neural networks (RNNs) on 52,722 human and 89,591 veterinary records. We established relevant baselines by training Decision Trees (DT) and Random Forests (RF) on the same data. We finally investigated the effect of merging data across clinical settings and probed model portability.
Results:We show that the LSTM-RNNs accurately classify veterinary/human text narratives into top-level categories with an average weighted macro F 1 score of 0.735/0.675 respectively. The evaluation metric for the LSTM was 7 and 8% higher than that of the DT and RF models respectively. We generally did not find evidence of model portability albeit moderate performance increases in select categories.
Discussion:We see a strong positive correlation between number of training samples and classification performance, which is promising for future efforts. The use of LSTM-RNN models represents a scalable structure that could prove useful in cohort selection, which could in turn better address emerging public health concerns.
Conclusion:Digitization of human and veterinary health information will continue to be a reality. Our approach is a step forward for these two domains to learn from, and inform, one another.