Motivation
Existing computational models can predict single- and double-mutant fitness but they do have limitations. First, they are often tested via evaluation metrics that are inappropriate for imbalanced datasets. Second, all of them only predict a binary outcome (viable or not, and negatively interacting or not). Third, most are uninterpretable black box machine learning models.
Results
Budding yeast datasets were used to develop high performance Multinomial Regression (MN) models capable of predicting the impact of single, double, and triple genetic disruptions on viability. These models are interpretable and give realistic non-binary predictions and can predict negative genetic interactions in triple-gene knockouts. They are based on a limited set of gene features, and their predictions are influenced by the probability of target gene participating in molecular complexes or pathways. Furthermore, the MN models have utility in other organisms such as fission yeast, fruit flies, and humans, with the single gene fitness MN model being able to distinguish essential genes necessary for cell autonomous viability from those required for multicellular survival. Finally, our models exceed the performance of previous models, without sacrificing interpretability.
Availability
All code used to generate results and figures in this manuscript are available at our Github repository at https://github.com/KISRDevelopment/cell_viability_paper. The repository also contains a link to the genetic interaction (GI) prediction website that lets users search for GIs using the MN models.
Supplementary information
Supplementary data are available at Bioinformatics online.