The correct identification of metabolic activity in tissues or cells under different environmental or genetic conditions can be extremely elusive due to mechanisms such as post-transcriptional modification of enzymes or different rates in protein degradation, making difficult to perform predictions on the basis of gene expression alone. Context-specific metabolic network reconstruction can overcome these limitations by leveraging the integration of multi-omics data into genome-scale metabolic networks (GSMN). Using the experimental information, context-specific models are reconstructed by extracting from the GSMN the sub-network most consistent with the data, subject to biochemical constraints. One advantage is that these context-specific models have more predictive power since they are tailored to the specific organism and condition, containing only the reactions predicted to be active in such context. A major limitation of this approach is that the available information does not generally allow for an unambiguous characterization of the corresponding optimal metabolic sub-network, i.e., there are usually many different sub-network that optimally fit the experimental data. This set of optimal networks represent alternative explanations of the possible metabolic state. Ignoring the set of possible solutions reduces the ability to obtain relevant information about the metabolism and may bias the interpretation of the true metabolic state.
In this work, we formalize the problem of enumeration of optimal metabolic networks, we implement a set of techniques that can be used to enumerate optimal networks, and we introduce DEXOM, a novel strategy for diversity-based extraction of optimal metabolic networks.
Instead of enumerating the whole space of optimal metabolic networks, which can be computationally intractable, DEXOM samples solutions from the set of optimal metabolic sub-networks maximizing diversity in order to obtain a good representation of the possible metabolic state. We evaluate the solution diversity of the different techniques using simulated and real datasets, and we show how this method can be used to improve in-silico gene essentiality predictions in Saccharomyces Cerevisiae using diversity-based metabolic network ensembles. Both the code and the data used for this research are publicly available at https://github.com/MetExplore/dexom.