Version 6 of the consensus yeast metabolic network refines biochemical coverage and improves model performance

Summary statistics of Yeast 5 and Yeast 6

Summary statistics	Yeast 5	Yeast 6
Genes	918	900
Metabolites	1655	1458
Reactions	2110	1888
Reactions with PMID references	37.6%	40.4%

Because of additional quality curation, Yeast 6 has fewer blocked reactions and a greater percentage of reactions annotated with literature evidence; it also contains fewer genes, metabolites and reactions than Yeast 5. Neither reconstruction includes open reading frames annotated as ‘dubious’ in the Saccharomyces Genome Database (7).

Table 1.

Summary statistics of Yeast 5 and Yeast 6

Summary statistics	Yeast 5	Yeast 6
Genes	918	900
Metabolites	1655	1458
Reactions	2110	1888
Reactions with PMID references	37.6%	40.4%

Table 2.

Comparing gene essentiality predictions of Yeast 5 and Yeast 6

^aTP/(TP + FN)

^bTN/(TN + FP)

^cTP/(TP + FP)

^dTN/(TN + FN)

^e formula (8)

Yeast 6 has fewer false-positive predictions and more true-negative predictions of gene essentiality, leading to improvements in sensitivity, specificity, positive predictive value, negative predictive value and Matthews correlation coefficient (see Discussion for more information on the use of this metric).

Table 3.

Comparing auxotroph mutant predictions of Yeast 5 and Yeast 6

Simulation Results	Yeast 5	Yeast 6
Auxotroph-inducing genes	92	93
Correct auxotrophy predictions	57	64
Mutant incorrectly predicted to be viable in minimal medium	32	26
Mutant incorrectly predicted to be inviable in supplemented medium	3	3

Simulation Results	Yeast 5	Yeast 6
Auxotroph-inducing genes	92	93
Correct auxotrophy predictions	57	64
Mutant incorrectly predicted to be viable in minimal medium	32	26
Mutant incorrectly predicted to be inviable in supplemented medium	3	3

Yeast 6 has more genes for which a deletion has been reported to cause auxotrophy and has more correct predictions of such auxotrophy than Yeast 5.

Table 3.

Comparing auxotroph mutant predictions of Yeast 5 and Yeast 6

Simulation Results	Yeast 5	Yeast 6
Auxotroph-inducing genes	92	93
Correct auxotrophy predictions	57	64
Mutant incorrectly predicted to be viable in minimal medium	32	26
Mutant incorrectly predicted to be inviable in supplemented medium	3	3

Simulation Results	Yeast 5	Yeast 6
Auxotroph-inducing genes	92	93
Correct auxotrophy predictions	57	64
Mutant incorrectly predicted to be viable in minimal medium	32	26
Mutant incorrectly predicted to be inviable in supplemented medium	3	3

Yeast 6 has more genes for which a deletion has been reported to cause auxotrophy and has more correct predictions of such auxotrophy than Yeast 5.

Novel features of Yeast 6

Metabolites, reactions and genes differ between Yeast 5 and Yeast 6. Detailed lists of common and distinct metabolites, reactions, genes, constraints, auxotrophs and knockout predictions are included as supplementary data, as is the code used to generate these comparisons. The supplementary data also include specific rationale for reaction deletions and additions, as well as detailed descriptions of the function of each gene removed or added. In summary, Yeast 5 has 203 metabolites that are not in Yeast 6, whereas Yeast 6 introduces 6 new metabolites. The smaller number of metabolites in Yeast 6 arises from the removal of metabolites involved in those dead-end reactions annotated as a modeling reaction (i.e. it was included in Yeast 5 without biochemical or genomic evidence). No dead-end reactions that are annotated with a gene were removed. Yeast 5 has 242 reactions that are not in Yeast 6, whereas Yeast 6 has 20 reactions that are not in Yeast 5. As detailed in supplementary data, reactions removed in Yeast 6 include those with incorrect cofactor specificity, those involving protein modification, those not supported upon literature review and dead-end modeling reactions. Yeast 5 has 31 genes that are not in Yeast 6, and Yeast 6 has 13 genes that are not in Yeast 5. Of the 31 genes removed, 21 encode proteins involved in protein modification or Glycosylphosphatidylinositol (GPI)-anchor assembly (processes that are not strictly metabolic), 4 are annotated with ‘putative’ function, 1 was removed because of unclear cofactor specificity, 2 annotated poorly characterized reactions involved with ergosterol biosynthesis, 1 encodes a flippase, which would require a finer level of membrane compartmentalization than currently exists in the model, and the metabolic function of the remaining 2 are not well established.

Essentiality and auxotrophy predictions with Yeast 6

These changes to the reconstructed metabolic network, combined with the incorporation of suggested changes to the biomass definition (8), give rise to altered FBA predictions of gene essentiality in glucose-limited aerobic conditions and to altered predictions of auxotrophy induced through gene deletion. When compared with a list of essential genes we compiled (this list is included in the supplementary datatestYeast.m file), Yeast 6 has 13 new true positive predictions of gene essentiality (i.e. FBA predicts that biomass can be produced following the deletion of inessential genes), 27 new true negative predictions (i.e. FBA predicts that biomass cannot be produced when essential genes are deleted), 1 new gene predicted to induce auxotrophy following mutation and 7 newly correct predictions of auxotrophy phenotypes. Yeast 6 also introduces new predictions that differ from laboratory observations, two new false-positives (growth predicted despite the deletion of an essential gene), one new auxotrophic mutant predicted incorrectly to be incapable of growth in supplemented media and two new mutants incorrectly predicted to be viable in minimal media despite reported auxtrophy. We note that such differences between model prediction and observation may arise from regulatory constraints that are outside the scope of a metabolic reconstruction, or may be informative of opportunities for continued network curation.

Yeast 6 aerobic and anaerobic growth

Unlike Yeast 5, Yeast 6 does not include separate biomass definitions for simulating aerobic and anaerobic growth. Yeast 6 correctly predicts that yeast will not grow on minimal media in strict anaerobic conditions [Saccharomyces cerevisiae requires supplementation with unsaturated fatty acids and sterols (9, 10)]. Simulating anaerobic growth with Yeast 6 requires the simulated medium to be supplemented with sterols [i.e. the exchange reactions for episterol, ergosterol, fecosterol, lanosterol, zymosterol and ergosta-5,7,22,24(28)-tetraen-3beta-ol must have nonzero lower bounds].

Data and annotation standards

The network reconstruction is provided as an SBML (11) file enriched with MIRIAM-compliant (12) annotations. Small- and macromolecules are referenced to community-standard databases such as Uniprot (13) or ChEBI (14). Molecules and reactions are also annotated with appropriate publications that contain supporting evidence. Thus, this network is presented in a computational framework that adheres to community standards and is entirely traceable. To facilitate comparison between reconstructions and models, Yeast 6 metabolite and reaction identifiers are consistent with Yeast 5 identifiers (e.g. reaction ‘r_0123’ in Yeast 5 is the same reaction as ‘r_0123’ in Yeast 6, and metabolite ‘s_0042’ in Yeast 5 is the same metabolite as ‘s_0042’ in Yeast 6).

Yeast 6 follows the same modeling conventions as Yeast 5. We used the SBML specification for encoding reaction and metabolite annotation rather than the COBRA Toolbox-specific convention of using a custom ‘Notes’ field. Our sign convention for exchange reactions is that positive flux values represent compounds produced in FBA simulation, and negative flux values represent compounds consumed, and we include biomass as a specific species in the model.

Assessing metabolic models

Because a gene picked at random is more likely to be inessential than essential for growth (15), overall accuracy is not a good metric for assessing model predictive ability (16). This statistical issue has previously been recognized, leading to the use of the ‘geometric mean accuracy’ as a metric for evaluating metabolic network models (2). Because geometric mean ignores the positive predictive value (also called precision) (17), we report all values of the contingency matrix (Table 2), and summarize the predictive ability of the model with the Matthews correlation coefficient (18), a metric that is robust across a range of prevalence values and incorporates positive and negative predictive values.

We include the testYeast.m script as supplementary data to facilitate evaluation of this model. This script compares model phenotype predictions against lists of verified yeast open reading frames, genes that we consider essential and genes that cause auxotrophy upon deletion. We compiled these lists from the Yeast Deletion Project (15) and from information in the Saccharomyces Genome Database (19).

Discussion

Yeast 6 is the current state-of-the-art reconstruction of the S. cerevisiae metabolic network. It eliminates many inferred reactions for which there is no evidence, adds new reactions based on recent evidence and results in improved predictions of experimental data. It maintains the distinction between GENRE and GEM, and by emphasizing traceable annotation for included information, it differentiates between established biochemistry and hypotheses that may be generated by automated techniques such as gap-filling algorithms (20). It will thus be a useful addition to the consensus resource and to the large community of researchers who use the yeast metabolic model to guide experimental and modeling efforts.

Limitations

Improving the reconstruction of the yeast metabolic network remains an ongoing project. In addition to the model predictions that differ from experimental observations described above (i.e. false-positive, false-negative and incorrect auxotroph predictions), there remains substantial opportunity to improve the reconstruction of lipid metabolism. This point is most evident from the fact that unsaturated fatty acids are not currently required for simulating anaerobic growth, but also arises through the continued use of generic lipid species [i.e. compounds using generic residual (-R) groups, rather than precise stoichiometrically balanced definitions of fatty acid moieties].

Additional limitations arise from the appropriately limited scope of the metabolic network reconstruction. Condition-dependent constraints that arise from various regulatory mechanisms are not included in Yeast 6. Thus, pathways that are affected through transcriptional regulatory events such as glucose repression may be incorrectly predicted to carry fluxes under FBA (e.g. in the absence of additional constraints, malate can cycle between mitochondrial malate dehydrogenase and cytoplasmic malate dehydrogenase). Integration of regulatory and metabolic networks remains an area of active research (21, 22).

Like other metabolic network models, not all reactions in Yeast 6 can carry flux in FBA simulation. Yeast 6 has 738 blocked reactions (39%), a similar portion of blocked reactions as Yeast 5 (38%). Blocked reactions indicate knowledge limitations (such as reactions leading to dead-end metabolites whose metabolic fate or origin is unknown, or reactions involved in unconnected portions of metabolism, which form unconnected subgraphs in the network). Like FBA predictions that differ from observation, the number of blocked reactions is also affected by condition-specific constraints, particularly constraints on reaction reversibility. We have observed that relaxing the reversibility of reactions involving nucleotide cofactors reduces the number of blocked reactions.

Unlike Yeast 5, the prediction of anaerobic ethanol production in FBA simulation requires manual restriction of the reaction catalyzed by ATP synthase. The requirement for this condition-dependent constraint may arise from the lack of regulatory constraints as described above, or from a need for physicochemical capacity constraints on allowable flux. Additionally, it may reflect other, presently uncharacterized limitations. We note that as with integrating regulatory constraints, detailed reconstruction of cofactor and proton balancing also remains an area of active research in the constraint-based modeling community (23); redox conditions are clearly of critical importance to the function of ATP synthase.

An invitation to participate in the community effort to reconstruct the yeast metabolic network

Computational reconstruction and modeling of yeast metabolism is an ongoing project, and we invite additional community participation in this effort. Suggestions for improving the yeast consensus reconstruction or derived models should be submitted to network.reconstruction@manchester.ac.uk. Metabolites and enzymes should be unambiguously identified, using existing model or database (ChEBI or UniProt) identifiers. New reactions should be supplied with primary evidence for their mechanism and catalysis, via PubMed identifiers. Reactions without evidence should have clear reasons for their proposed addition.

Acknowledgments

The authors gratefully acknowledge Hnin Aung and Kalliopi Trachana, along with Julie Bletz, James Eddy, John Earls and other members of the Price group for their constructive comments on the manuscript and stimulating discussion of metrics for evaluating model predictive accuracy.

Funding

Support was provided for the work in the Cornell Biofuels Research Laboratory by U. S. Department of Transportation, Federal Grant (#DTOS59-07-G-00052), and the U.S. Department of Agriculture, Award (#2010-38502-21900). Support was provided for the work at the Institute for Systems Biology by Center for Systems Biology/2P50 GM076547, the Grand Duchy of Luxembourg Systems Medicine Consortium, and the Camille-Dreyfus Teacher Scholar Program (NDP) (to B.D.H.). Support was provided by the EU FP7 (KBBE) grant (289434) ‘BioPreDyn: New Bioinformatics Methods and Tools for Data-Driven Predictive Dynamic Modelling in Biotechnological Applications’ (to K.S.). Funding for open access charge: U. S. Department of Transportation, Federal Grant (#DTOS59-07-G-00052), and the U.S. Department of Agriculture, Award (#2010-38502-21900)

Conflict of interest. None declared.

References

Palsson

BØ

Herrgård

. ,

Connecting extracellular metabolomic measurements to intracellular flux states in yeast

BMC Syst. Biol.

2009

, vol.

pg.

Kuepfer

Sauer

Blank

. ,

Metabolic functions of duplicate genes in Saccharomyces cerevisiae

Genome Res.

2005

, vol.

(pg.

1421

1430

)

Herrgard

Swainston

Dobson

, et al. ,

A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology

Nat. Biotech.

2008

, vol.

(pg.

1155

1160

)

Dobson

Jameson

Simeonidis

, et al. ,

Further developments towards a genome-scale metabolic model of yeast

BMC Syst. Biol.

2010

, vol.

pg.

145

Heavner

Smallbone

Barker

, et al. ,

Yeast 5 - an expanded reconstruction of the Saccharomyces cerevisiae metabolic network

BMC Syst. Biol.

2012

, vol.

pg.

Schellenberger

Que

Fleming

RMT

, et al. ,

Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0

Nat. Protoc.

2011

, vol.

(pg.

1290

1307

)

Price

Reed

Palsson

. ,

Genome-scale models of microbial cells: evaluating the consequences of constraints

Nat. Rev. Microbiol.

2004

, vol.

(pg.

886

897

)

Zomorrodi

Maranas

. ,

Improving the iMM 904 S. cerevisiae metabolic model using essentiality and synthetic lethality data

BMC Syst. Biol.

2010

, vol.

pg.

178

Andreasen

Stier

. ,

Anaerobic nutrition of Saccharomyces cerevisiae. II. Unsaturated fatty acid requirement for growth in a defined medium

J. Cell. Physiol.

1954

, vol.

(pg.

271

281

)

Andreasen

Stier

TJB

. ,

Anaerobic nutrition of Saccharomyces cerevisiae. I. Ergosterol requirement for growth in a defined medium

J. Cell. Physiol.

1953

, vol.

(pg.

)

Hucka

Finney

Sauro

, et al. ,

The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models

Bioinformatics

2003

, vol.

(pg.

524

531

)

Novère

Finney

Hucka

, et al. ,

Minimum information requested in the annotation of biochemical models (MIRIAM)

Nat. Biotechnol.

2005

, vol.

(pg.

1509

1515

)

The UniProt Consortium

Update on activities at the Universal Protein Resource (UniProt) in 2013

Nucleic Acids Res.

2012

, vol.

(pg.

D43

D47

)

PubMed

OpenURL Placeholder Text

Hastings

De Matos

Dekker

, et al. ,

The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013

Nucleic Acids Res.

2012

, vol.

(pg.

D456

D463

)

Winzeler

Shoemaker

Astromoff

, et al. ,

Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis

Science

1999

, vol.

285

(pg.

901

906

)

Alberg

Park

Hager

, et al. ,

The use of “overall accuracy” to evaluate the validity of screening or diagnostic tests

J. Gen. Intern. Med.

2004

, vol.

(pg.

460

465

)

Kubat

Holte

Matwin

. ,

Machine learning for the detection of oil spills in satellite radar images

Mach. Learn.

1998

, vol.

(pg.

195

215

)

Baldi

Brunak

Chauvin

, et al. ,

Assessing the accuracy of prediction algorithms for classification: an overview

Bioinformatics

2000

, vol.

(pg.

412

424

)

Cherry

Hong

Amundsen

, et al. ,

Saccharomyces Genome Database: the genomics resource of budding yeast

Nucleic Acids Res.

2012

, vol.

(pg.

D700

D705

)

Brooks

Burns

Fong

, et al. ,

Gap detection for genome-scale constraint-based models

Adv. Bioinformatics

2012

, vol.

2012

(pg.

)

Blazier

Papin

. ,

Integration of expression data in genome-scale metabolic network reconstructions

Front. Physiol.

2012

, vol.

pg.

299

Chandrasekaran

Price

. ,

Probabilistic integrative modeling of genome-scale metabolic and regulatory networks in Escherichia coli and Mycobacterium tuberculosis

Proc. Natl Acad. Sci. USA

2010

, vol.

107

(pg.

17845

17850

)