Zoonosis, the natural transmission of infections from animal to human, is a far-reaching global problem. The recent outbreaks of Zika virus and Ebola virus are examples of viral zoonosis, which occur more frequently due to globalization. In case of a virus outbreak, it is helpful to know which host organism was the original carrier of the virus. Once the reservoir or intermediate host is known, it can be isolated to prevent further spreading of the viral infection. Recent approaches aim to predict a viral host based on the viral genome, often in combination with the potential host genome and using arbitrary selected features. This methods have a clear limitation in either the amount of different hosts they can predict or the accuracy of the prediction. Here, we present a fast and accurate deep learning approach for viral host prediction, which is based on the viral genome sequence only. To assure a high prediction accuracy we developed an effective selection approach for the training data, to avoid biases due to a highly unbalanced number of known sequences per virus-host combinations. We tested our deep neural network on three different virus species (influenza A virus, rabies lyssavirus, rotavirus A) and reached for each virus species a AUC between 0.94 and 0.98, outperforming previous approaches and allowing highly accurate predictions while only using fractions of the viral genome sequences. We show that deep neural networks are suitable to predict the host of a virus, even with a limited amount of sequences and highly unbalanced available data. The deep neural networks trained for this approach build the core of the virus host predicting tool VIDHOP (VIrus Deep learning HOst Prediction). doi: bioRxiv preprint shown to perform well with character sequences 28 , such as DNA or RNA sequences, potentially allowing for a furthermore increase in the prediction quality.
Author Contributions