Reconstruction of genome-scale metabolic model of Yarrowia lipolytica and its application in overproduction of triacylglycerol

Yarrowia lipolytica is widely studied as a non-conventional model yeast owing to the high level of lipid accumulation. Therein, triacylglycerol (TAG) is a major component of liposome. In order to investigate the TAG biosynthesis mechanism at a systematic level, a novel genome-scale metabolic model of Y. lipolytica was reconstructed based on a previous model iYL619_PCP published by our lab and another model iYali4 published by Kerkhoven et al. The novel model iYL_2.0 contains 645 genes, 1083 metabolites, and 1471 reactions, which was validated more effective on simulations of specific growth rate. The precision of 29 carbon sources utilities reached up to 96.6% when simulated by iYL_2.0. In minimal growth medium, 111 genes were identified as essential for cell growth, whereas 66 essential genes were identified in yeast extract medium, which were verified by database of essential genes, suggesting a better prediction ability of iYL_2.0 in comparison with other existing models. In addition, potential metabolic engineering targets of improving TAG production were predicted by three in silico methods developed in-house, and the effects of amino acids supplementation were investigated based on model iYL_2.0. The reconstructed model iYL_2.0 is a powerful platform for efficiently optimizing the metabolism of TAG and systematically understanding the physiological mechanism of Y. lipolytica.


Background
Yarrowia lipolytica is a non-conventional model oleaginous yeast, which is capable of accumulating substantial amounts of neutral lipids, even above than 50% of its dry cell weight (DCW) (Blazeck et al. 2014;. Today, the chemical industry has been widely developing technology to synthesize multiple kinds of products using oils and fats, such as surfactants, polymers, and biofuels (Ledesma-Amaro and Nicaud 2016). Triacylglycerol (TAG) was discovered as the major constitute of liposome and predominant storage of metabolic energy in microorganism cells, which can also be widely used as valuable resources for dietary consumption and biofuels in industrial application (Yen et al. 2008).
In recent years, great potential of lipid synthesis in Y. lipolytica attracts much attention to study its physiological mechanism and explore its application in biofuels. Many efforts, including fermentation and genetic manipulation strategies, have been made for improving TAG production in Y. lipolytica (Papanikolaou and Aggelis 2010). It has been reported that the yield of TAG could reach up to 40% of dry cell weight when using glycerol or industrial fats as carbon source . In Y. lipolytica, TAG can be biosynthesized followed by the Kennedy pathway, and the delta-9 stearoyl-CoA desaturase (SCD) was identified as a rate-limiting enzyme in biosynthesis pathway (Dulermo and Nicaud 2011;Kennedy 1957). Besides, genes YALI0E32769g (DGA1) and YALI0E16797g (LRO1), encoding major triacylglycerol synthases, play important roles in the biosynthesis pathway (Athenstaedt 2011). Additionally, the repression of TGL3 and TGL4, encoding triacylglycerol lipases involved in the degradation of TAG, could lead to high levels of TAG accumulation (Morin et al. 2011). Although much improvement of TAG production and lipid accumulation in Y. lipolytica has been achieved, it is still necessary to rationally discover possible limiting factors involved in TAG biosynthesis at a systems level, and in silico analyze its metabolic characteristics under a wide range of genotypic and environmental conditions.
A genome-scale metabolic model (GSMM) is a powerful platform for investigation of microorganism's metabolic processes and rational strain design, as a GSMM may represent the metabolic genotype-phenotype relationships in the microorganism (Feist et al. 2009). GSMMs have been widely used in guidance of strain design, analysis of complex biological phenomena, and contextualization of high-throughput data (Jian et al. 2016a;Loira et al. 2012;Oberhardt et al. 2009;. So far, there are three versions of genomescale metabolic network models of Y. lipolytica, namely iYL619_PCP (Pan and Hua 2012), iNL895 (Loira et al. 2012), and iYali4 (Kerkhoven et al. 2016), where yeast consensus network was used as a reference model when iNL895 and iYali4 were reconstructed. Besides, the model iNL895 was integrated automatically, which may cause a series of network gaps when simulating cell growth or the production of several compounds. However, the knowledge in Y. lipolytica physiology and metabolism is being updated continuously over time, and an up-to-date metabolic model is most likely required to improve the accuracy of prediction. On the other hand, large quantities of in silico optimization methods have been developed and both gene-level and reaction-level strategies can be identified based on constraint-based flux analysis. Normally, gene-level targets are more feasible and efficient than reaction-level ones according to the complex associations between genes and reactions in metabolic model. For instance, our previously published algorithms, Opt-GeneKnock (Zhang et al. 2015a), IdealKnock (Gu et al. 2016), and APGC (Jian et al. 2016b), which could provide potential gene-level strategies based on logic transformation of model (LTM) (Zhang et al. 2015b), are more feasible for in vivo implementation.
In this study, to systematically investigate the metabolism of Y. lipolytica and gain novel insight into TAG biosynthesis of this strain, a new genome-scale metabolic model of Y. lipolytica, named iYL_2.0, was reconstructed. The utilizations of different carbon sources were simulated based on this newly developed metabolic model followed by the validation by experimental data. Also, analysis of gene essentiality of Y. lipolytica cells growth in minimal growth (MG) medium or yeast extract (YE) medium was conducted, and the essential genes predicted based on four Y. lipolytica models in YE medium were further verified by database of essential genes (DEG). Additionally, potential genetic strategies were identified by three recent in silico simulation algorithms in order to enhance the biosynthesis of TAG in Y. lipolytica, and furthermore, the effects of amino acid supplementation on TAG production were investigated based on the novel model of iYL_2.0.

Characteristics of the updated genome-scale metabolic model of Y. lipolytica
The reconstructed genome-scale metabolic model iYL_2.0 (Additional file 1) included 645 genes, 1083 metabolites, and 1471 reactions (1008 metabolic reactions, 326 transport reactions, and 137 exchange reactions) ( Table 1). Compared with iYL619_PCP, the model size of iYL_2.0 was more extensive, but less than iNL895 and iYali4 for the reason that models iNL895 and iYali4 were constructed automatically and several genes, metabolites, and reactions in iNL895 and iYali4 were quoted directly from the metabolic model of Saccharomyces cerevisiae (Heavner et al. 2012). The directionalities or reversibility of all metabolic reactions were defined based on thermodynamics, therefore, there exist 266 reversible and 741 irreversible enzymatic reactions in the model of iYL_2.0. Besides, two additional compartments (peroxisome and endoplasmic reticulum) were introduced in iYL_2.0 compared with its previous version of iYL619_PCP. The peroxisome compartment is mainly associated to glyoxylate and dicarboxylate metabolism, while endoplasmic reticulum compartment is related to glycerolipid and glycerophospholipid metabolism. In addition, 1471 reactions were classified into nine different subsystems based on KEGG pathway database. These subsystems include carbohydrate metabolism, energy metabolism, lipid metabolism, nucleotide metabolism, amino acid metabolism, cofactors and vitamins metabolism, exchange reactions, transport reactions, and other metabolisms (terpenoid backbone biosynthesis and N-glycan biosynthesis). Compared to model iYL619_ PCP, the number of reactions in each subsystem of iYL_2.0 was increased (Fig. 1a). The additional reactions in iYL_2.0 were mainly associated with lipid metabolism (79 reactions), amino acid metabolism (54 reactions), and transport reactions (91 reactions). Specifically, lipid metabolism, the largest subsystem in iYL_2.0, accounts for 18.6% of the total reactions ( Fig. 1b), in comparison with 16.7% in iYL619_PCP and 14.2% in iYali4.

Analysis and verification of model iYL_2.0
In order to quantitatively demonstrate the advantage of iYL_2.0, we firstly simulate the in silico cell growth characteristics in minimal growth medium based on three previously constructed models and the new model iYL_2.0 using flux balance analysis (FBA) (Orth et al. 2010;Toya and Shimizu 2013). The maximum specific growth rate obtained from iYL_2.0 was 0.0378 h −1 , which agreed very well with the experimental value (0.0352 h −1 ) (Oberhardt et al. 2009), better than the results predicted by iYL619_PCP (0.0439 h −1 ), iNL895 (3.4367 h −1 ), and iYali4 (1.7454 h −1 ) ( Table 2). The inaccuracy of simulation result by model iNL895 was largely due to the biomass equation used in iNL895 which was integrated automatically from the yeast consensus model version 4.36 (Herrgard et al. 2008) and might lack of essential gap filling. Similarly, the inconsistency between iYali4 and experimental data might also attribute to the fact that the biomass equation used in iYali4 was quoted directly from the yeast consensus network version 5 (Heavner et al. 2012).
In addition, these models were employed to investigate in silico cell growth on 29 different carbon sources by FBA, in which each substance was used as a sole carbon source and cell growth was optimized. The utilization of different substrates by Y. lipolytica has previously been reported (Sitepu et al. 2014;Wehrspann and Fullbrandt 1985). According to the above literatures, Y. lipolytica could grow on 11 carbon sources, while cell grown was not detected when other 18 compounds were used as carbon sources (Table 3). However, the utilization of 5, 9, and 7 out of 29 carbon sources were not correctly predicted by using models iYL619_PCP, iNL895, and iYali4, respectively. In comparison, the newly developed model iYL_2.0 could be employed to simulate the growth characteristics on 28 carbon sources (expected for hexadecane). Models iYL619_PCP, iNL895, and iYali4 showed false positive growth on trehalose, while iYL_2.0, with trehalose glucohydrolase removed after careful consideration, showed no growth and were consistent with the experimental data. Moreover, iYL619_PCP, the earlier version of iYL_2.0, failed to show the positive growth on d-mannitol and d-glucitol, which was then fixed by adding two reactions (d-mannitol:NAD + 2-oxidoreductase and d-glucitol:NAD + 2-oxidoreductase) enabling the conversion of d-mannitol or d-sorbitol to betad-fructose (Scolnick and Lin 1962). On the other hand, both iNL895 and iYali4 exhibited predictions of positive  growth of Y. lipolytica on sucrose, cellobiose, d-glucosamine, and d-xylose, and iNL895 also showed positive growth of the strain on maltose and melibiose, which were, however, inconsistent with the in vivo experimental result. The false positive growths on sucrose, cellobiose and d-glucosamine probably attributed to a direct quotation of yeast reactions in iNL895 and iYali4, while the false positive growths on d-xylose, melibiose, and maltose may be activated by intention for simulation purpose (Forster et al. 2007;Wang et al. 2014). Interestingly, all four models considered in this study showed false positive growth on hexadecane. We found that when the annotated genes encoding the required enzymes were added to four models, hexadecane could not be utilized as a sole carbon source, which might be due to the absence of the corresponding regulatory mechanism in the four models.
In addition, the essentialities of individual gene of Y. lipolytica grown in both minimal growth medium and yeast extract medium were investigated based on the four above models. All genes involved in iYL_2.0 were classified into three categories: essential genes, partially essential genes, and non-essential genes. As concluded in Additional file 2, 111 (17.21% of the total) and 66 (10.23% of the total) genes were found as essential cells grown in MG medium and YE medium, respectively. The simulated 66 essential genes in YE medium were further examined via BLASTp with the database of essential genes (DEG) (Gao et al. 2015), where 35 genes were found to have a high homology with S. cerevisiae (Giaever et al. 2002). For instance, UMP phosphotransferase (YALI0F09339g) is involved in the synthesis of CMP and UMP, which was essential to cell growth. Similarly, thioredoxin reductase (YALI0D27126g) is essential in maintaining the balance of intracellular NADP + / NADPH, which is vital to the whole metabolic process of Y. lipolytica. In comparison, only 31 of 71, 25 of 143, and 29 of 112 of essential genes obtained from the simulation of iYL619_PCP, iNL895, and iYali4 were confirmed by DEG, respectively, indicating a better prediction performance of essential genes by iYL_2.0. Further, the essentialities of genes involved in eight subsystems of iYL_2.0 are shown in Fig. 2. Most essential genes were found distributed in amino acid metabolism, lipid metabolism, and nucleotide metabolism in case of MG medium. When analyzing gene essentiality for cells grown in YE medium, only approximately 3% genes categorized in amino acid metabolism were considered essential for cell growth due to the fact that yeast extract contains a mix of peptides, several free amino acids, purine, and pyrimidine bases. For transport reaction, the essential genes associated  In vivo: experimental results, In silico: simulation results. '+' represents that the carbon source could be utilized and '−' represents not growth on that carbon source. The green color indicates that in silico result was consistent with in vivo data, whereas red color for the inconsistency. The uptake rate of each carbon source was set at 20 mmol/gDCW/h, and uptake rates of sulfate, phosphate, ammonia, O 2 , H 2 O, CO 2 and H + were set as 1000 mmol/gDCW/h with amino acids metabolism were mainly identified in cells grown in MG medium as the uptake of amino acids needs transport reactions. In addition, for genes categorized in other metabolism, the proportions of essential genes in this category were similar when cells were cultivated in both MG medium and YE medium, implying fewer alternatives routes for biosynthesis of some terpenoids and N-glycan in cell metabolic network.

Prediction of metabolic engineering strategies for improving triacylglycerol production
Yarrowia lipolytica is capable of producing TAG by the Kennedy pathway (Fig. 3). As an acyl donor, fatty acyl-CoA is mostly supplied by fatty acid synthase system (FAS) (Schweizer et al. 1988;Wang et al. 2011). In addition, acetyl-CoA and malonyl-CoA are essential carbon donor molecules for TAG biosynthesis. In model iYL_2.0, there are multiple pathways that could supply the above essential molecules, including citrate shuttle and conversion from the TCA cycle, amino acids degradation, and other acetyl-CoA generation process ). In addition to TAG production, acetyl-CoA and malonyl-CoA are also used for fatty acid synthesis, amino acid synthesis, and glycerophospholipid synthesis.
To gain additional insight into TAG biosynthesis in Y. lipolytica, our previously published simulation methods OptGeneKnock (Zhang et al. 2015a), IdealKnock (Gu et al. 2016), and APGC (Jian et al. 2016b), were employed to identify metabolic engineering strategies to enhance TAG production in cells grown MG medium. The Opt-GeneKnock algorithm was modified from OptKnock algorithm through LTM , where OptKnock is able to screen the best potential reaction knockout strategies based on a bilevel programming framework. However, IdealKnock algorithm is a topdown framework that scans for the mutants of interest firstly, followed by the identification of the knockout target by iterative searching process. The best single deletion strategy identified by OptGeneKnock was the knockout of carbonate hydrolyase (YALI0F21406g) from the metabolic network, leading to an increase of specific TAG production rate from 0.0126 to 0.0810 mmol/ gDCW/h together with the optimal cell growth. The carbonate hydrolyase associated with enzymatic reaction R0450 is involved in energy metabolism and nitrogen metabolism of Y. lipolytica. By comparison of the in silico metabolic flux distribution before and after the deletion of YALI0F21406g, we observed that more carbon fluxes were distributed into acetyl-CoA and malonyl-CoA metabolism while fluxes of amino acid synthesis pathway and synthesis pathways of other compounds decreased. Moreover, prediction of double deletion strategy with OptGeneKnock indicated that in addition to YALI0F21406g, further deletion of gene YALI0F15587g (d-glyceraldehyde-3-phosphate glyceronetransferase, R0487) could improve TAG production rate. Actually, TAG synthesis was reported to be indirectly enhanced together with the decrease in pentose phosphate pathway flux and increase in glycerol formation when R0487 was removed from Y. lipolytica (Tai and Stephanopoulos 2013).
Meanwhile, another quadruple knockout target (YAL-I0F19514g, YALI0B19382g, YALI0A19206g, and YAL-I0D06325g) was identified by IdealKnock program of the COBRA Toolbox v2.0 (Additional file 3). O-Acyltransferase (YALI0F19514g) is a significant enzyme responsible for the synthesis of phosphatidylcholine from 1-acyl-sn-glycero-3-phosphocholine, which was essential in the last step of TAG biosynthesis (Makri et al. 2010). The deficiency of O-acyltransferase might decrease the flux of phosphatidylcholine to glycerophospholipid synthesis and therefore improve the TAG production directly. The gene YALI0B19382g is associated with three reactions (R0161, R0164, and R0176) in the FAS pathway. The in silico deletion of YALI0B19382g could suppress the fatty acid biosynthesis from fatty acyl-CoA, and the TAG synthesis was enhanced accordingly. Furthermore, the removal of YALI0A19206g and YALI0D06325g responsible for glutamate degradation might contribute to the activation of acetyl-CoA carboxylase and the formation of malonyl-CoA due to increased availability of intracellular glutamate (Yu et al. 2003).
To further investigate gene targets of overexpression for enhanced TAG production in MG medium, we performed APGC algorithm with the use of newly constructed model iYL_2.0. The prediction results showed that 6 genes were identified as potential targets for overexpression (Additional file 3). All these gene targets play important roles in TAG synthetic pathway and these genes are involved in amino acids metabolism, degradation of fatty acids, glycerolipid metabolism, and the growth of biomass. For example, the overexpression of acetyl-CoA:carbon-dioxide ligase (YALI0C11407g) led to an increase in malonyl-CoA synthesis, which hence improved the availability of acetyl-CoA for TAG biosynthesis (Qiao et al. 2015). Three potential gene targets (YALI0F08415g, YALI0C01859g, and YALI0E18238g) categorized in amino acids metabolism are responsible for the conversion of amino acids into acetyl-CoA and malonyl-CoA. In addition, reaction R0310 (YAL-I0D07986g) is the last step of TAG biosynthetic pathway, and overexpressing this step might contribute directly to the production enhancement of TAG, which was reported by Yen et al. (2005). The simulation of the overexpression of the above 6 genes showed a 34.1% improvement of TAG production rate compared to the wide-type strain (from 0.0126 to 0.0169 mmol/gDCW/h).

Conclusions
In this study, a novel GSMM of Y. lipolytica, named iYL_2.0, was reconstructed based on two previous models and metabolic information from updated databases and recent literatures. The new model contains 645 genes, 1083 metabolites, and 1471 reactions across different compartments. This model was found superior to the two previous models when employed to simulate cell growth characteristics on glucose or other carbon sources. Gene essentiality analysis resulted in 111 essential genes for cells grown in MG medium and 66 essential genes when cells cultivated in YE medium. Using model iYL_2.0, double and quadruple gene knockout targets that could facilitate TAG production were identified by OptGeneKnock and IdealKnock, respectively. In addition, 6 gene amplification targets were predicted by APGC framework, which could lead to a 34.1% improvement of TAG production for increasing the availability of important precursors of acetyl-CoA and malonyl-CoA. Further simulation with iYL_2.0 indicated that the additional supplementation of six amino acids, namely l-glycine, l-alanine, l-cysteine, l-serine, l-threonine, and l-aspartate, might largely contribute to the enhancement of TAG production, where the addition of either l-threonine or l-aspartate resulted in a 55.5% increase in TAG production rate. In summary, the newly developed genome-scale metabolic model iYL_2.0 may serve as a powerful platform for better understanding metabolic viability and guiding rational metabolic engineering strategies for Y. lipolytica.  (Degtyarenko et al. 2008), and PubChem (Wang et al. 2009) were used to characterize the annotation of metabolites based on their chemical formulas and neutral charges. Additionally, the reaction reversibilities were judged by BioPath (Schreiber 2002) database and thermodynamic data of metabolites. Compared with iYL619_PCP, two new compartments, the peroxisome and endoplasmic reticulum, were determined according to the other reference model iYali4 and subcellular localization prediction tool CELLO (Yu et al. 2004). Transport reactions and exchange reactions were added for metabolic network connectivity. Moreover, to refine the draft model, metabolic gaps were identified by literature data and the GapFind (Satish Kumar et al. 2007) program in COBRA Toolbox v2.0 (Schellenberger et al. 2011).

Simulation and constraint-based flux analysis
The verification and analysis included optimal cell growth in minimal growth (MG) medium (Lanciotti et al. 2005) and utilization of different carbon sources. Meanwhile, essential genes were analyzed by singlegenedeletion function in COBRA Toolbox v2.0 in MG medium and YE medium (Ye et al. 2015b Fig. 4 Simulation effects of amino acid supplementation on TAG production in Y. lipolytica grown in MG medium CO 2 , and H + were set as 1000 mmol/gDCW/h. The YE medium contains not only basic elements, but also 20 amino acids, and the uptake rates of amino acids were set as 5 mmol/gDCW/h. Biomass equation was set as the objective function for the different conditions, and optimization was performed by GLPK solver. The in silico simulations were performed by flux balance analysis (FBA), which is a constraint-based flux analysis method (Orth et al. 2010;Toya and Shimizu 2013). Meanwhile, under a pseudo-steady-state metabolic model, FBA determines metabolic flux distribution based on the assumption that organisms could endeavor to optimize growth or production. Mathematically, the optimization of objective functions subjected to stoichiometric and capacity constraints can be formulated as follows: Z represents the cellular objective function of all the metabolic reactions where the relative weights are determined by the coefficient c j . S ij corresponds to the stoichiometric coefficient of the metabolite i involved in reaction j. v j denotes the flux or specific rate of metabolic reaction j. v j min and v j max refer to the lower and upper bounds of the flux of reaction j, respectively.

Identification of potential targets through gene-level algorithms
For analysis of TAG production, two novel knockout algorithms OptGeneKnock (Zhang et al. 2015a) and IdealKnock (Gu et al. 2016) and one overexpression algorithm, analysis of production and growth coupling (APGC) (Jian et al. 2016b) were carried out for cells grown in MG medium. The APGC program was applied to identify amplification targets for improving production, and OptGeneKnock and IdealKnock were used to identify knockout targets. The OptGeneKnock algorithm was performed by applying logic transformation of model (LTM) (Zhang et al. 2015a) to OptKnock (Burgard et al. 2003). GLPK solver and Gurobi solver were used for linear programming and quadratic programming, respectively. The synthesis reaction of TAG, R0310, was set as objective function to identify the potential targets, and the minimum specific growth rate of biomass was set as 0.0378 h −1 in MG medium.

Authors' contributions
QH and CZ developed and designed the study. SW carried out the reconstruction of iYL_2.0, performed the statistical analysis, and prepared the manuscript. XJ provided help for implementation of the in silico algorithms. QH and CZ revised the manuscript. All authors read and approved the final manuscript.