BackgroundAntibiotic resistance and its rapid dissemination around the world threaten the efficacy of currently-used medical treatments and call for novel, innovative approaches to manage multi-drug resistant infections. Phage therapy, i.e., the use of viruses (phages) to specifically infect and kill bacteria during their life cycle, is one of the most promising alternatives to antibiotics. It is based on the correct matching between a target pathogenic bacteria and the therapeutic phage. Nevertheless, correctly matching them is a major challenge. Currently, there is no systematic method to efficiently predict whether phage-bacterium interactions exist and these pairs must be empirically tested in laboratory. Herein, we present our approach for developing a computational model able to predict whether a given phage-bacterium pair can interact based on their genome.ResultsBased on public data from GenBank and phagesDB.org, we collected more than a thousand positive phage-bacterium interactions with their complete genomes. In addition, we generated putative negative (i.e., non-interacting) pairs. We extracted, from the collected genomes, a set of informative features based on the distribution of predictive protein-protein interactions and on their primary structure (e.g. amino-acid frequency, molecular weight and chemical composition of each protein). With these features, we generated multiple candidate datasets to train our algorithms. On this base, we built predictive models exhibiting predictive performance of around 90% in terms of F1-score, sensitivity, specificity, and accuracy, obtained on the test set with 10-fold cross-validation.ConclusionThese promising results reinforce the hypothesis that machine learning techniques may produce highly-predictive models accelerating the search of interacting phage-bacteria pairs.
The need to predict phage-bacteria interactions is a nowadays concern to overcome bacterial resistance issue; public genome databases contain highly imbalanced datasets which have hindered this task. Throughout this paper we will investigate, implement and evaluate One-Class Learning algorithms in order to predict phage-bacteria interactions using only positive samples. We will use the programming language Python aided by Scikit-Learn, Tensorflow and keras to develop the machine learning models and test them with real phagebacteria interactions datasets. We trained the models using cross validation technique generating a gridsearch with all the datasets to find several combinations of hyperparameters available. Furthermore, we optimized those hyperparameters by using Pareto fronts based on seven different performance metrics, improving the efficiency of each algorithm for a given dataset. To refine each algorithm's performance separately we used the ensemble learning technique with an odd number of algorithms by simple voting. Finally, we managed to achieve an overall performance of 80% in predicting phage-bacteria interactions trained only with positive classes, this percentage in practice means that when a patient has an infection resistant to antibiotics, we have 80% of saving the life rather than maybe a 0% while finding the correct phage for the pathogenic host.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.