Transcriptional regulatory networks of methanol-independent protein expression in Pichia pastoris under the AOX1 promoter with trans-acting elements engineering

To explore the differences in the intracellular transcriptional mechanism in carbon-derepressed and wild-type Pichia pastoris strains fed with three different carbon sources. RNA in carbon-derepressed (Δmig1Δmig2Δnrg1-Mit1; Mut) and wild-type (WT) P. pastoris fed with three different carbon sources (dextrose, glycerol, and methanol) were sequenced. Differentially expressed genes (DEGs) associated with these carbon sources were obtained and clustered into modules using weighted gene co-expression network analysis (WGCNA). Signaling pathway enrichment analysis was performed using KEGG, and protein to protein interaction (PPI) network was also constructed. A total of 2536 DEGs were obtained from three intersections, and some of them were enriched in carbon sources and involved in carbon metabolism, secondary metabolisms, and amino acid biosynthesis. Two modules, MEgreenyellow (involved in protease, oxidative phosphorylation, endoplasmic reticulum protein processing, folate carbon pool, and glycerol phospholipid metabolism pathways) and MEmidnightblue (involved in protease, endocytosis, steroid biosynthesis, and hippo signaling pathways) were significantly correlated with the strain type. Eight hub genes and two sub-networks were obtained from PPI network. Sub-network A enriched in proteasomes pathway while sub-network B enriched in ribosome pathway. The genes involved in carbon metabolism, secondary metabolic, and amino acid biosynthesis pathways changed significantly under different carbon sources. The changes in proteasome and ribosome activities play roles in carbohydrate metabolism in the methanol-free PAOX1 start-up Mut strain.


Introduction
Pichia pastoris (P. pastoris) is a widely used recombinant expression system for exogenous genes having prokaryotic advantages of having a simple operation and rapid growth, and also has eukaryotic cell functions of posttranslational modifications of proteins. Yeast expression systems include Saccharomyces cerevisiae system, methylotrophic yeast expression system, and merozoite yeast expression system (van den Hazel et al. 1996). The methylotrophic yeast P. pastoris expression system is one of the most commonly used expression systems.
The most commonly used promoter on P. pastoris expression system is the alcohol oxidase 1 promoter (AOX1), which is a carbon source-dependent promoter whose transcription expression is strongly induced by methanol, but inhibited by other carbon sources, such as glucose, glycerol, or ethanol (Liang et al. 2014). P. pastoris synthesizes a large number of enzymes when using methanol as the sole carbon source. However, the use of methanol has many disadvantages as it is toxic, flammable, and volatile, which makes methanol harmful to the human body (Polupanov et al. 2012;Portnoy et al. 2011). Methanol fermentation requires large amounts of oxygen, around three to four times higher than the amount needed when the carbon source is glucose (Lin et al. 2000). It also produces hydrogen peroxide (H 2 O 2 ) during methanol metabolism, causing oxidative stress and proteolytic degradation of some proteins (Wang et al. 2010).
Researchers have worked on modifying the expression system of P. pastoris by various methods which avoid the use of methanol as a carbon source (Lu et al. 2007). The most convenient method to avoid methanol dependence was to engineer other highly expressed promoters that can replace AOX1 and work without methanol induction (Chen et al. 2007). AOX1 promoter has several regulatory factors, such as Mig1, Mig2, Nrg1, Mit1, and Prm1. Among these factors, Mig1, Mig2, and Nrg1 inhibit AOX1 promoter in cells grown in glucose and glycerol, while Mit1 and Prm1 activate AOX1 promoters in cells grown in methanol (Prielhofer et al. 2013). Nevertheless, the regulatory mechanism of AOX1 promoter in P. pastoris remains unclear.
An earlier study (Lin-Cereghino et al. 2006) carried out a large-scale mutation screening analysis to identify regulatory factors and to elucidate their mechanisms and the studies also not recorded how to construct an engineered strain or to verify the success of the strain construction. To understand the effects of transcriptional repressors on AOX1 and the transcriptional differences between carbon-derepressed strain (Mut) and wild-type (WT) strain, we modified expression levels of carbon source repressors in P. pastoris . Mig1, Mig2, and Nrg1 which act as carbon source repressors were inhibited and Mit1 levels were enhanced in the carbon-derepressed strain ). In the present study, we analyzed the transcriptomes of WT and the carbonsource derepressed strain Δmig1Δmig2Δnrg1-Mit1 (Mut) of P. pastoris under different carbon sources using RNAseq method. Differential expression genes (DEGs) analysis was applied on different carbon sources with Mut and WT strains. Weighted gene co-expression network analysis (WGCNA) and Kyoto encyclopedia of genes and genomes (KEGG) enrichment were used to explore the effect of different carbon sources on the efficiency of gene expression. Protein-protein interaction (PPI) network analysis was used to obtain key sub-networks and hub genes with Mut and WT strains. The goal of the study was to investigate the mechanisms underlying the carbon derepression and AOX1 promoter regulation of the methanol-free P AOX1 start-up ∆mig1∆mig2∆nrg1-Mit1 strain.

Pichia pastoris strains and culture conditions
Pichia pastoris wild-type (WT) strain GS115 (Life Technologies, Carlsbad, CA, USA) and the genetically engineered P. pastoris strain Δmig1Δmig2Δnrg1-Mit1 (Mut) (constructed and preserved in our laboratory) were used for this study. A chemostatic cultivation medium for recombinant P. pastoris was used to provide a stable environment to ensure the reliability and repeatability of RNA-seq analysis. In brief, Mut and WT strains (stored at − 80 °C) were inoculated directly into 250-mL shaking flasks containing 50 mL yeast nitrogen-base/dextrose (YND) medium (0.67% yeast nitrogen base (YNB) and 1% glucose) or into 50 mL minimal glycerol media (MGY) as the primary seed culture till OD 600 = 2-6, respectively, and cultivated at 30 °C for 24 h with agitation (200 rpm). Then, Mut and WT seed cultures were used to inoculate into 500 mL YND or MGY medium (0.67% YNB + 1% glycerin) till OD 600 = 2-6, respectively, as the second seed cultures and the cultures were then cultivated for 16 h at 30 °C (200 rpm). Batch cultivated was started by inoculating the secondary seeds at a concentration of 1:10 (v/v) into a fermenter containing basal salts media (BSM). Batch fermentations were carried out at 30 °C, pH 5.0 (controlled by 25% ammonium hydroxide), and a minimum of 30% dissolved oxygen concentration. When carbon source in the medium was depleted and the concentration of dissolved oxygen increased sharply, the chemostat medium using glucose (20.0 g L −1 ), glycerol (20.0 g L −1 ), or methanol (20.0 g L −1 ) as the sole carbon source was supplemented. The fermentation broth is drained by a peristaltic pump to maintain a constant volume of the fermentation broth and the flowrate of glucose, glycerol, or methanol. Finally, they were inoculated into a 50 mL liquid MGY medium with an initial concentration of OD 600 = 1, and two parallel samples were inoculated for each strain. Samples were taken when the specific growth rate reached μ = 0.03 h −1 .

RNA extraction, construction of cRNA Library and High-throughput sequencing
The detection of RNA samples was provided by Beijing Novosource Bioinformation Technology Co. RNA concentration was measured using Qubit ® RNA Assay Kit with Qubit ® 2.0 Fluorometer (Life Technologies, CA, USA), and RNA integrity was assessed with the RNA Nano 6000 Assay Kit by the Agilent 2100 Bioanalyzer (Agilent Technologies, USA), and only the qualified samples were used for library construction and sequencing experiments. As shown in Table 1, a total of 12 samples were used for RNA-seq analysis, including WT strains in glucose, glycerol, and methanol conditions (named as W_D, W_G, and W_M) and Mut strain in the same three conditions (named as M_D, M_G, and M_M). Each sample was done in duplicates. Illumina Hiseq2000 (Illumina, NY, USA) high-throughput sequencing system was used to sequence RNA of the samples.

Data pre-processing
Quality control of raw reads was processed by removing reads containing adapter sequences and reads with N (information cannot be determined) bases greater than 10%. Consequently, clean reads were obtained for downstream analysis. Alignment of the paired-end clean reads to the coding sequence (CDS) of GS115 in The European Bioinformatics Institute (EMBL) database was performed using TopHat v2 (Kim et al. 2013). Based on the predicted gene model of Cufflinks (Trapnell et al. 2012), the alternative splicing event on each sample was classified and counted by Asprofile software (Florea et al. 2013). All the data obtained from reads were assembled together with Cufflinks and then compared with known gene models by Cuffcompare software to predict new transcripts. Gene expression levels were quantified as the number of reads per kilobase of exon region per million reads (RPKM).

DEG analysis
DEGs were analyzed using the R package DEseq (Anders and Huber 2010; Anders and Huber 2012). For the samples with biological repetition, the original count was corrected by DEseq and the log of the count ratio between the two groups was calculated. Finally, log 2 FoldChange (FC) value was obtained. Benjamini & Hochberg's method (Benjamini and Hochberg 1995) was used for multiple testing correction and genes with an adjusted P-value < 0.05 was selected as the cut-off criterion for the identification of DEGs.
DEGs were analyzed for growth in different sources, including W_G vs. W_D, M_G vs. M_D, W_M vs. W_D, M_M vs. M_D, W_M vs. W_G, and M_M vs. M_G and a total of 6 groups of DEGs were analyzed. Then, we analyzed the DEGs intersection between Mut and WT strains in the same carbon comparison, such as the intersection DEGs between W_G vs. W_D and M_G vs. M_D. Then, a total of three groups of intersection DEGs were determined. Finally, we combined these three groups of intersection DEGs together, which were thought to be genes related to carbon sources.

WGCNA network construction
Using weighted gene co-expression analysis (WGCNA) R package (Langfelder and Horvath 2008), a co-expressed network of DEGs related to their carbon sources was constructed, and further screening of the gene sets which were significantly related to their strain type, and genes which were considered to be both related to the carbon source and the strain type were selected. Two methods were used to find modules related to the strain type. First, we calculated the correlation between each module feature vector gene and strain types. Second, the mean value of the correlation between the trait and gene expression in each module was taken as the significance. The greater the significance, the more the correlation between the module and the trait. By selecting the weight value of power, the whole network was distributed as a scale-free network, and the power with structure in R 2 reached 0.8 would be generally selected. Because of batch-related effects, sample heterogeneity, and experimental conditions used, we selected power = 9 and then, based on clustering and dynamic pruning of the data (Barabási 2009), we clustered the highly correlated genes into modules. Finally, gene sets were identified significantly related to the type of strain.

Functional enrichment analysis
KOBAS 3.0 (http://kobas .cbi.pku.edu.cn/anno_iden. php) (Xie 2011;Wu et al. 2006) software was used to analyze the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (Kanehisa and Goto 2000) with DEGs related to carbon sources. Then, we also analyzed the KEGG database for genes obtained from the modules of WGCNA analysis with a cut-off of P value < 0.05 and gene count ≥ 2, and these pathways were significantly related to the strain type and carbon source.

Construction of protein to protein (PPI) networks
Search tool for the retrieval of interacting genes (STRING v 10.0) (Szklarczyk 2014) was used to explore interactions between proteins. Komagataella pastoris was selected as the reference species to predict whether there were interactions between matched protein sets. Parameter PPI score was set to 0.7 (high confidence), which requires the interacting protein nodes to be included in the above module genes. Then, the PPI network was built by Cytoscape software (version 3.2.0) (Shannon 2003) and cytoNCA plug-in (Tang et al. 2015) (v.2.1.6, http://apps.cytos cape.org/apps/cyton ca) was used to analyze the topological properties of the node network.
The results included degree centrality (DC), Bweenness Centrality (BC), and closeness centrality (CC), and the hub proteins were obtained by ranking the network topological properties of each node. We used MCODE (Degree Cutoff = 2, Node Score Cutoff = 0.2, K-core = 2, MaxDepth = 100) (Bandettini et al. 2012) in Cytoscape to analyze the subnetworks in the PPI network. The modules with score > 5 were chosen to perform the KEGG pathway analysis using KOBAS 3.0.

DEGs analysis
After data pre-processing, a total of 5381 genes in the gene matrix were obtained from 12 samples. According to the cut-off criterion used, a pairwise comparison was used to compare the three different carbon sources (Fig. 1a). There were 399 overlapping DEGs between W_G vs. W_D and M_G vs. M_D. There were 1863 overlapping DEGs between W_M vs. W_D and M_M vs. M_D. In addition, there were 2098 overlapping DEGs between W_M vs. W_G, and M_M vs. M_G. By combining the union DEGs of the Mut strain and WT strain grown under different carbon sources, 2536 DEGs were detected as genes related to carbon sources. Bidirectional hierarchical clustering was carried out to analyze these 2536 DEGs (Fig. 1b). The results of bidirectional hierarchical clustering showed that DEG analysis only separated methanol samples from the three carbon sources, while glucose and glycerol samples were aggregated in the other branches. However, DEGs in glucose and glycerol samples were separated according to the Mut or WT types. Thus, we inferred that these genes were not only associated with carbon source treatment, but also potentially related to the strain type, laying the foundation for our next coexpression analysis. The KEGG enrichment analysis results of the 2536 DEGs showed that with the change of carbon source, the genes involved in carbon metabolism, secondary metabolic pathway, and amino acid biosynthesis changed significantly ( Table 2).

Screening of related modules and genes by WGCNA
To further screen genes related to strain type, we performed WGCNA analysis for the 2536 DEGs (Fig. 2). Based on the clustering and dynamic pruning method, 2536 DEGs were divided into 20 modules (the grey module was a collection of genes that cannot be aggregated into other modules) (Fig. 2a). Modules with heterogeneity coefficient < 0.2 were merged together, and finally 10 modules were obtained (Fig. 2b, c). By calculating the correlation coefficient between the module eigengene (ME) and sample traits, 4 modules (green-yellow: 122 genes, midnight-blue: 266 genes, grey60: 44 genes, light green: 41 genes) were significantly correlated with the strain type (p < 0.05) (Fig. 3a). Both MEgreenyellow and MEmidnightblue modules were significantly correlated with the type of strain (correlation coefficient > 0.9). Genes in the MEmidnightblue module were down-regulated in Mut vs. WT, and genes in the MEgreenyellow module were up-regulated in Mut vs. WT (Fig. 3c-d).

Functional enrichment pathway analysis
The genes in the MEgreenyellow and MEmidnightblue modules were analyzed by KEGG pathway enrichment, and the results are shown in Fig. 3b. The genes in the MEgreenyellow module were mainly enriched in proteases, oxidative phosphorylation, endoplasmic reticulum protein processing, folate carbon pool, and glycerol phospholipid metabolism pathway, while the genes in the MEmidnightblue module were enriched in proteases, endocytosis, steroid biosynthesis, and hippo signaling pathway (Fig. 3b).

Construction of PPI networks
We performed PPI analysis of 388 DEGs in MEgreenyellow and MEmidnightblue modules, and a total of 502 PPI edges and 155 proteins were obtained, as shown in Fig. 5a (Additional file 1: Table S1). The topological properties of PPI network were analyzed. Eight proteins (XP_002489693.1, XP_002490020.1, XP_002491845.1, XP_002492676.1, XP_002493471.1, XP_002493717.1, XP_002493779.1, and XP_002494220.1) obtained high scores in DC, BC, and CC within the list of top 30 proteins (Table 3), and these 8 proteins were deemed as hub proteins. Five sub-networks were identified by the PPI network function, of which 2 (sub-network A and subnetwork B) had a score > 5 (Fig. 5b, c). Sub-network A contained 25 genes, of which 8 belonged to the MEgreenyellow module, and the other 17 genes belonged to the MEgreenyellow module. All genes in sub-network B belonged to the MEmidnightblue module. KEGG pathway enrichment was applied to sub-network A and subnetwork B, and the results showed that sub-network A was enriched in the proteasome pathway, while sub-network B was enriched in the ribosome pathway.

Discussion
The commonly used and tightly regulated promoter used to drive heterologous protein expression in P. pastoris is the AOX1 promoter; the gene expression of AOX1 is repressed in carbon sources (Brierley et al. 1990). A P. pastoris mutant strain expressing AOX1-lac2 fusion has been demonstrated to be able to use sorbitol, alanine, trehalose, or mannitol as the sole carbon source (Inan and Meagher 2001). The deletion of the hexose transporter PpHXT1, leading to glucose or fructose-induced expression of the AOX1 gene and pexophagy, is associated with catabolite repression of Aox in P. pastoris . We previously described a novel Mut strain of the AOX1 promoter-based P. pastoris expression system, ∆mig1∆mig2∆nrg1-Mit1 ). However, the mechanism underlying the catabolite and AOX1 regulation of this expression system should be explored. In this study, we compared the gene expression patterns of the Mut strain cultured in three carbon sources (glucose, glycerol, and methanol). A total of 2536 carbon sourceassociated DEGs were detected from the intersection of three groups, and 388 DEGs were found to be associated with both the carbon source type and strain type (Mut or WT). Further investigation of these DEGs indicated the significant involvement of the proteasome and ribosome pathways. Moreover, several hub genes related to AOX1 regulation and carbon sources were identified. When the two P. pastoris strains (WT and Mut) were induced by different carbon sources, the DEGs were mainly associated with carbon metabolism, secondary metabolic biosynthesis, and amino acid biosynthesis. Carbon metabolism in microbial cells mainly involves glycolysis, the pentose phosphate pathway, and the three 2. c The systematic clustering tree of genes and the gene modules generated by dynamic shearing in which different colors represent different gene modules, while the grey part represents genes that cannot be merged into any other module carboxylic acid (TCA) cycle (Zhang et al. 2012). Both the TCA and glyoxylic acid cycles provide energy and precursors for microorganisms (Trapnell et al. 2014). Wang and Yan (2017) showed that when P. pastoris was cultured with glycerol and methanol. Genes associated with the first stage of glycolysis were expressed at the basal level. Dihydroxy acetone phosphate and dihydroxy acetone phosphate were the methanol metabolites in the peroxisome. These metabolites were then transported through the peroxisomal membrane to the cytoplasm and entered the second stage of glycolysis. In a previous study, the growth status of yeast with glucose and methanol as carbon sources was investigated. The metabolic flow of TCA was found to have decreased by three times under methanol growth conditions, while no significant change in transcription or protein levels in the TCA cycle was observed (Russmayer et al. 2015). These results indicate that the TCA cycle may supply different precursor The glycerol transporter GT1 may participate in the process of P AOX1 inhibition of P. pastoris in a medium containing glycerol, and knockdown of this gene can eliminate the glycerol repression of P AOX1 (Zhan et al. 2016). In this study, we found several hub proteins in the PPI network of the 388 genes in two significant modules, including XP_002489693.1 (one of six ATPases of the 19S regulatory particle of the 26S proteasome involved in degradation), XP_002490020.1 (ubiquitin-protein ligase (E3) that interacts with Rpt4p and Rpt6p), XP_002491845.1 (metalloprotease subunit of the 19S regulatory particle of the 26S proteasome lid), XP_002492676.1 (Alpha 4 subunit of the 20S proteasome), XP_002493471.1 (Alpha 1 subunit of the 20S core complex of the 26S proteasome), XP_002493717.1 (one of six ATPases of the 19S regulatory particle of the 26S proteasome), XP_002493779.1 (Adenylate cyclase, required for cAMP production and cAMP-dependent protein kinase signaling), and XP_002494220.1 (Beta 6 subunit of the 20S proteasome). These proteins were mainly associated with the proteasome and ribosome; one of these proteins was ubiquitin ligases E3, which was a proteolytic enzyme guiding relative protein degradation in the proteasome. The proteasome, which has many catalytic functions and can selectively degrade proteins in cells, is a polyclonal macromolecule complex widely distributed in organisms (Tomko and Hochstrasser 2011). To remove the proteins that the cell does not need for functioning as soon as possible, eukaryotic cells can increase the expression of the proteasome by upregulated transcription of genes involved in the proteasome. Ubiquitin-proteasome-linked protein degradation of the key gluconeogenic enzyme, fructose-1,6-bisphosphatase, is associated with glucose uptake and carbohydrate metabolism of P. pastoris (Santt et al. 2008). Ribosomes are the synthesis machine of proteins, and they are also an important site for cells to perceive the nutrition level within cells. Studies (Beilharz and Preiss 2004;Prielhofer et al. 2015) reported that the proportion of polyribosomes in total ribosomes (40s, 60s, 80s/monosomes and polysomes) increased by approximately twofold, as detected by RNA-seq sequencing, when the different carbon sources induced the expression of proteins in P. pastoris. When methanol was used as the carbon source, high levels of protein expression were observed. At this time, a high number of amino acids need to be synthesized and more ribosomes need to participate in protein synthesis (Prielhofer et al. 2015). Therefore, we suggested that the changes in ribosomal and proteasomal activities were associated with carbohydrate metabolism and glucose-mediated as well as glycerol-mediated induction of P AOX1 in P. pastoris, indicating that the carbon derepressed expression system actually causes protein degradation and amino acid synthesis to meet the demand for heterologous protein secretion. However, the signal transduction mechanism that induces the derepression of AOX1 in ∆mig1∆mig2∆nrg1-Mit1 remains to be investigated further.
In the carbon metabolism network, several genes were up-regulated in M_G and M_D compared with W_G and W_D, such as DASL and FLD1. DAS1 encodes dihydroxyacetone synthase Das and FLD1 encodes formaldehyde dehydrogenase Fld. They are all critical enzymes in the methanol utilization pathway. Both P DAS1 and P FLD1 Fig. 5 Protein-protein interaction (PPI) network. a PPI network with midnight-blue and green-yellow modules. b, c Sub-networks of PPI network, genes indicated in blue color from midnight blue module, representing down-regulated genes, and genes indicated green color from green-yellow module, representing up-regulated genes are regulated by methanol (Vogl and Glieder 2013). In the present study, the higher expression level of DASL and FLD1 may play a critical role in carbon metabolism of the dextrose and glycerol medium. TKT plays an important role in the pentose phosphate acid cycle and the photosynthetic reduced pentose phosphate cycle. In the group M_M, the expression level of TKT was down-regulated compared with the group M_G and M_D. This result may indicate that in the case of methanol as the sole carbon source, biotransformation efficiency of the carbon source may be lower in M_M than M_G and M_D. Further results require experimentation to verify.

Conclusion
In conclusion, with the change of the carbon source, the genes involved in the carbon metabolism pathway, secondary metabolic pathway, and amino acid biosynthesis pathway changed significantly in P. pastoris. Changes in the proteasome and ribosome activities played a significant role in the carbohydrate metabolism of the methanol-free P AOX1 start-up ∆mig1∆mig2∆nrg1-Mit1 strain. Such an expression system could be an efficient alternative to conventional production methods using methanol as the carbon source.
Additional file 1: Table S1. Proteins in the PPI network.