Genome-scale models of Escherichia coli K-12 MG1655 metabolism have been able to predict growth phenotypes in most, but not all, defined growth environments. Here we introduce the use of an optimization-based algorithm that predicts the missing reactions that are required to reconcile computation and experiment when they disagree. The computer-generated hypotheses for missing reactions were verified experimentally in five cases, leading to the functional assignment of eight ORFs (yjjLMN, yeaTU, dctA, idnT, and putP) with two new enzymatic activities and four transport functions. This study thus demonstrates the use of systems analysis to discover metabolic and transport functions and their genetic basis by a combination of experimental and computational approaches.constraint-based ͉ flux balance analysis ͉ functional genomics ͉ metabolic reconstruction ͉ systems biology C urrent genome annotations include a substantial fraction of ORFs with unknown function (1, 2); methods are needed to provide insight into the possible function of these genes, without the need for screening individual gene products across a multitude of possible activities. Metabolic and regulatory networks are reconstructed from genome annotations and scientific literature to integrate and represent our current knowledge of network components and interactions (3). Dual-perturbation methods have been developed to study regulatory networks (4), which can be used to reconcile model predictions and experimental data, thus leading to possible iterative model refinements and experimentally testable hypotheses (4, 5). Iterative modelbuilding can be systematized through the use of computational algorithms (6). Such an approach is presented here, and it consists of four steps. First, computational analysis identified discrepancies between model predictions and growth phenotyping data by using a reconstructed genome-scale Escherichia coli metabolic network. Second, an algorithm then identified enzymatic and transport reactions that likely were missing from the current metabolic reconstruction that could reconcile model predictions and experimental observations. Third, ORFs that might be responsible for these missing activities then were identified by using literature searches, sequence-homology searches (7), context-based homology methods (8, 9), and in some cases unpublished microarray data. Fourth, experimental verification of the algorithm's predictions then were carried out by evaluating growth phenotypes of single-deletion strains available in the Keio collection (10) and gene-expression measurements. Here we present a comprehensive combined computational and experimental approach to analyze phenotypic data and genome annotation information in a global manner to uncover individual ORF function.
ResultsGrowth phenotyping data (11), available from Biolog (Hayward CA; www.biolog.com), were used to identify missing reactions from the reconstructed genome-scale metabolic network of E. coli MG1655 (iJR904) (12). Using a flux balance model of E. coli, we ...