Materials
Celluclast, a commercial cellulase from T. reesei (Lot # CCN03141), was donated by Novozymes (Novo, Bagsvaerd, Denmark). P-Aminophenyl β-d-cellobioside (sc-222106, Lot #K213), used as an affinity ligand for cellobiohydrolase, was purchased from Santa Cruz Biotechnology Inc. (Santa Cruz, CA, USA). All other chemicals required for protein purification and hydrolysis experiments were purchased from Sigma-Aldrich (Milwaukee, WI). Whatman No. 1 filter paper (Whatman, Inc., Florham Park, NJ) and cotton balls (Kroger Co., Cincinnati, Ohio) were used as the pure cellulose samples for the hydrolysis experiments. The commercial β-glucosidase (Novozyme 188) from Aspergillus niger was purchased from Sigma-Aldrich (Milwaukee, WI).
Model development
Stochastic hydrolysis model
Development of this comprehensive model consisting cellulose structural details and complex enzyme–substrate interactions consisted of in silico representative cellulose model, enzyme characterization, and developing algorithms for modeling the enzyme actions. In this model, cellulose was modeled based on the structure of cellulose Iβ, the most abundant cellulose form in higher plants. The structure was modeled as a group of microfibrils (MF) (2–20 nm diameter), and each microfibril contains multiple elementary fibrils (EF), the basic building block of cellulose with about 3.5 nm diameter and containing 36 glucose chains (Chinga-Carrasco 2011; Fan and Lee 1983; Lynd et al. 2002). The number of EF in an MF, glucose molecules in one chain of glucose (i.e., degree of polymerization, DP), was assumed to be constant during each simulation. These parameters were dynamically determined at the beginning of the cellulose structure simulation, based on the type of cellulose simulated. The degree of crystallinity in cellulose (50–90%) is a critical factor affecting the cellulose hydrolysis, as amorphous regions are believed to be relatively more susceptible to enzyme action and determine initial hydrolysis rates. To capture this important property in this model, glucose chain in each EF were assumed to pass through multiple crystalline regions (200 glucose molecules long regions) separated by amorphous regions. The concept of modeled cellulose structure and its resemblance with actual cellulose structure is illustrated in Fig. 2.
Each glucose molecule in the modeled microfibril was given a unique serial number as its identity, and a big data set containing other parameters (e.g., reducing/non-reducing end, EF surface, MF surface, crystalline or amorphous, soluble, non-soluble, distance from chain end, etc.) that describe structural properties of that bond. During developing algorithms for cellulase actions, enzyme accessibility was determined based on these parameters (data set with each glucose molecule) and action pattern of enzymes. For additional details of cellulose model please refer to earlier publications (Kumar 2014; Kumar and Murthy 2013).
Cellulase enzymes vary in mode of actions, and for this model, the enzymes were classified into eight classes depending upon their structure and mode of action (e.g., non-processive endocellulase with cellulose binding molecule (CBM), processive CBH I with CBM, processive CBH II with CBM, etc.). Please refer to our earlier paper (Kumar and Murthy 2013) for more details on enzyme classifications, their characteristics, and mode of actions. Cellulose hydrolysis is dependent on biomass-dependent extrinsic factors (crystallinity, accessibility, and DP) and enzyme action is dependent on intrinsic factors (enzyme activity, stability with pH and temperature, etc.). The extrinsic factors were modeled in the simulated cellulose structure described above. Enzyme activity (depends on enzyme origin and level of purification) and enzyme loading (amount of enzyme/g substrate; based on experimental conditions) information was transformed into theoretical maximum turnover number (maximum possible number of bonds hydrolyzed per unit time for each enzyme) (N
hi_max) (Eq. 1) for each class of enzyme.
$$N_{{{\text{hi\_max}}}} = E_{i} *U_{i} *6.023*10^{17} * \frac{{G_{\text{Sim}} }}{{6.023*10^{23} }}*162*S_{i} ,$$
(1)
where ‘E
i
’ is amount of ‘ith’ enzyme used (mg cellulose); ‘U
i
’ is activity of ‘ith’ enzyme (IU/mg enzyme); ‘G
sim’ is the number of glucose molecules simulated in the model; “162” is the average molecular weight of anhydrous glucose; ‘S
i
’ is stability of ‘ith’ enzyme under experimental conditions (temperature and pH). Value of “S
i
” could be calculated for any enzyme using empirical equations developed, such as Arrhenius rate relationship for temperature. Value of ‘S
i
’ is a real number between 0 and 1.
These numbers were further transformed to numbers of hydrolyzed bonds per microfibril based on the total number of microfibrils simulated and mode of action of enzymes. For example, for endoglucanase enzymes, these numbers were proportional to relative glucose molecules on the surface of microfibril. On the other hand, for CBH I and CBH II, these numbers were proportional to a relative number of chain ends available in one microfibril. Please refer to Kumar (2014) for more details.
The hydrolysis process was modeled using Monte Carlo simulation technique, which has been used successfully earlier for modeling the starch hydrolysis (Marchal et al. 2001, 2003; Murthy et al. 2011; Wojciechowski et al. 2001). The overall schematic for simulating the enzymatic hydrolysis for each enzyme is shown in Fig. 3 and detailed description is provided in Kumar and Murthy (2013). All the required substrate–enzyme interactions, such as binding of CBH only on chain ends, the higher binding probability of binding EG on MF surface than at EF surface, were incorporated into the model using algorithms. It was also made sure that sufficient glucose molecules (based on the size of enzyme) are available to allow binding.
Only one class of enzymes was modeled working at a time, so the model did not account for the enzyme crowding effect (locations occupied by other class of enzymes at the same time). These effects were incorporated in the modified model discussed in next section. Other than cellulose structural restrictions, some probabilities were defined corresponding to enzyme action. For example, the probability of hydrolysis of a β-1,4 bond hydrolysis located in amorphous regions was more than that of in crystalline region by an endoglucanase enzyme. Choice was made by generating a random number at each decision point and comparing it with the defined probability. The hydrolysis event would happen only in the case when the random number was greater than the probability of hydrolysis. Number of iterations were restricted using a counter (Fig. 3). If all conditions for hydrolysis were met for that bond, it was converted to broken bond and the counter was incremented. Similarly, the counter was given an increment corresponding to unsuccessful events also (in case binding or hydrolysis does not occur). After each broken bond, it was made sure to change properties of other glucose molecules in that chain (e.g., chain length, distance from chain end, solubility, etc.). If a glucose chain becomes soluble, part of the chain just beneath the soluble chain is exposed and becomes accessible to enzymes. The concept is described in detail elsewhere (Kumar 2014).
Modifications in the model
The model described in above section was the first report of a comprehensive stochastic model for cellulose hydrolysis that successfully captured the cellulose structural features (three dimensional), enzyme characteristics, and dynamic enzyme–substrate interactions. In this work, the model was further modified to capture more realistic expected interactions during hydrolysis by incorporating the (1) simultaneous action of enzymes from multiple classes at any instant of time to account for the enzyme crowding; (2) partial solubility of cello-oligomers with DP 6–13, and (3) production of glucose by exocellulase. In the previous version of the model, the model was simulated based on the iterative concept only; however, in real conditions multiple enzyme molecules act simultaneously and block the hydrolysis sites for each other (Igarashi et al. 2011). Enzyme crowding and simultaneous action of enzymes were incorporated in the current model by calculating the number of enzyme molecules based on the enzyme loading, their molecular weight, and number of glucose molecules simulated. The iterations are performed for every minute of hydrolysis and properties of substrate are changed after that at the end of the 1-min time step. For processive enzymes, once an enzyme molecule bound to chain end, it remains bound at the end of 1-min time step and continues further down the chain until it reaches the end of the chain or desorbs from the molecule as per its probability. Exocellulase enzyme binds to multiple cellulase chains (three chains in the model) (Asztalos et al. 2012; Levine et al. 2010), so it is essential that all three chains must be accessible to the enzyme (on surface and not blocked by other enzyme) for binding of the enzyme. In the previous version of model, it was assumed that glucose molecules equal to size of CBM only are required on surface and unblocked for binding, however in the current model whole length of enzyme was considered (except linker, because it is flexible and is compressed during movement) (Wang et al. 2012). The detailed schematics explaining algorithms developed to model CBH I and EG actions have been provided in the in Additional files 1 and 2, respectively. Cellodextrins with DP < 6 are considered to be completely soluble, DP 6–13 partially soluble and above 13 are insoluble in water (Lynd et al. 2002; Zhang and Lynd 2004). In the previous version of the model, all oligomers with DP > 6 were considered insoluble. While the CBM of the enzymes cannot bind to these chains due to its large size, the catalytic domains of the enzymes will still act on the oligomers in solution. In the absence of reliable literature data, the soluble fraction of the oligomers was set as a function of DP in the range of DP 7–13. The oligomers with DP 7–9, 10–11, and 12–13 were assumed 75, 50, and 25% soluble, respectively. Oligomers with DP < 6 were assigned a 100% solubility, and while oligomers with DP more than 13 were set to 0% solubility. In the previous model, the CBH action could only produce cellobiose during cellulose hydrolysis. However, glucose formation during cellulose hydrolysis by CBH action has been observed by some researchers (Eriksson et al. 2002; Medve et al. 1998), and was also observed in our experiments (discussed later in the “Results and discussion” section). Therefore, the model was modified to include glucose formation in addition to the cellobiose. A probability of glucose formation was included in the model, and glucose/cellobiose formation was decided by generating a random number and comparing with that probability. The probabilities and increments associated with productive various events (productive binding, no binding, non-productive binding, etc.) are listed in Additional file 3: Table S1.
Enzyme crowding/jamming phenomenon might not be critical at low enzyme dosages and during action of individual enzymes. Also, the other details incorporated into this model might be ignored if the final goal of the model is to simulate the sugar concentration only during the hydrolysis process. However, to simulate and optimize the composition/cocktail of enzymes, it is necessary to simulate the effects of each enzyme class carefully.
Model implementation and simulations
The algorithms of the hydrolysis model were written in C++ language. Random number generators were used in simulation of cellulose structure and hydrolysis process (Matsumoto and Nishimura 1998).
The cellulose structure was simulated for three model cellulose substrates Avicel, filter paper and cotton, to cover the range of substrates with different structural properties (DP and degree of crystallinity). Avicel is low-DP cellulose, with DP only about 300 and crystallinity index 0.5–0.6; whereas, cotton has relatively very high DP (about 2000–2500) and crystallinity index of 0.85–0.95 (Zhang and Lynd 2004). Hydrolysis simulations were performed based on the experimental conditions: weight of solution (scale of hydrolysis), solid loading, cellulose content, total enzyme loading (mg protein/g cellulose), ratio of enzymes present (EG:CBH I:CBH II:BG), temperature, pH and hydrolysis duration. Enzyme activities can be determined from supplier, literature, or can be determined using standard protocols (Ghose 1987). Unless determined in the lab, specific activities of enzymes from T. reesei were assumed as 0.4, 0.08, and 0.16 IU/mg of EG I, CBH I and CBH II, respectively (Zhang and Lynd 2006) for model simulations. The output from model included several data files containing glucose concentrations, oligosaccharide concentrations, chain distribution profile (number of chains of various lengths), crystallinity index profile (ratio of crystallinity at various time intervals), solubility profile and data sheets for each microfibril (illustrating major properties associated with glucose molecules) at various times during hydrolysis.
Model validation
The data from model simulations were compared with various sets of experimental results from cellulose hydrolysis in our lab and from literature (Bezerra and Dias 2004; Bezerra et al. 2011) to validate the model under various hydrolysis conditions.
Validation with experimental data
The model was validated with the results obtained from hydrolysis of pure cellulosic substrates (filter paper and cotton) using purified CBH I and CBH II. The cellulases CBH I and CBH II were purified from Celluclast (Novozymes, Denmark) using a series of chromatography steps in BioLogic LP system (Bio-Rad Laboratories, Hercules, CA, USA). The purification experiments were performed at room temperature and the collected enzymes were transferred and stored in the refrigerator at 4 °C.
Enzyme purification
The flow diagram of steps followed in the CBH I and CBH II purification is shown in Fig. 4. In the first step of purification, the Celluclast enzyme mixture was desalted using Sephadex G-25 Fine (dimensions: 2.5 cm × 10 cm) gel filtration column. The protein was rebuffered in 50 mM Tris–HCl buffer (pH 7.0) at 5 mL/min. The desalted protein was fractionated by anion-exchange chromatography using DEAE-Sepharose column (dimensions: 2.5 cm × 10 cm). The sample was loaded using 50 mM Tris–HCl buffer (pH 7.0) at 5 mL/min flow rate and was eluted stepwise: 1st elution at 35%, and 2nd elution at 100% of 0.2 M sodium chloride in 0.05 M Tris–HCl buffer (pH 7) (Jäger et al. 2010). The flow-through from DEAE column (rich in CBH II enzymes) was concentrated and rebuffered in 50 mM sodium acetate buffer (pH 5.0) using Pellicon XL 50 Ultrafiltration Cassette, with biomax 10 (Millipore, USA). The rebuffered protein was spiked with gluconolactone (final concentration of 1 mM) and loaded on the p-aminophenyl cellobioside (pAPC) affinity column (dimensions: 1.5 cm × 10 cm) with 0.1 M sodium acetate, containing 1 mM gluconolactone and 0.2 M glucose (pH 5.0) at flow rate of 1.5 mL/min (Jeoh et al. 2007; Sangseethong and Penner 1998). The function of gluconolactone in the buffer is to suppress β-glucosidase activity, which otherwise can cleave the ligand (Sangseethong and Penner 1998). The bound CBH II protein was eluted using the running buffer containing 0.01 M cellobiose [100 mM sodium acetate buffer containing 1 mM gluconolactone, 0.2 M glucose, and 0.1 M cellobiose (pH 5.0)]. The purified CBH II from affinity column was concentrated and loaded on the phenyl Sepharose column (dimensions: 1.0 cm × 10 cm) for hydrophobic interaction chromatography to separate core and intact proteins (Sangseethong and Penner 1998). The sample was loaded in high salt (0.35 M ammonium sulfate in 25 mM sodium acetate buffer, pH 5.0) and eluted with linear gradient from running buffer to elution buffer [25 mM acetate buffer containing 20% ethylene glycol (v/v), pH 5.0]. Hydrophobic interaction chromatography was performed on the second elution (CBH I rich) from the anion-exchange column, after concentrating and rebuffering with 25 mM sodium acetate buffer. The enzyme was loaded in very high salt (0.75 M ammonium sulfate in 25 mM sodium acetate buffer, pH 5.0) and eluted with linear gradient from running buffer to elution buffer [25 mM acetate buffer containing 5% ethylene glycol (v/v), pH 5.0]. The purified CBH II and CBH I fractions from hydrophobic interaction column were concentrated and rebuffered in 50 mM sodium acetate buffer, pH 5.0. Protein containing fractions were determined by measuring absorbance at 280 nm.
The fractions collected from the chromatographic purifications steps shown in Fig. 4 were analyzed by SDS-polyacrylamide gel electrophoresis to check for their purity. Based on the molecular weight comparison with marker, and literature data, the single bands in the numbered lanes 1 and 2 of Fig. 5 correspond to CBH II (MW 54 kDa) and CBH I (MW 61–64 kDa), respectively (Jäger et al. 2010; Medve et al. 1998; Sangseethong and Penner 1998). The activities of CBH I and CBH II on Avicel were determined as 0.478 and 0.379 IU/mg of protein, respectively.
During protein purification, the protein concentrations in the samples were determined based on Bradford assay using Quick Start™ Bradford Protein Assay Kit (Bio-Rad, USA) and bovine serum albumin (BSA) as standard. The activities of purified CBH I and CBH II were determined on Avicel in 50 mM sodium acetate buffer, pH 5.0. 1 mL of Avicel solution (10 g/L) with final enzyme concentration of 0.1 mg/mL was incubated (mixed end to end) at 45 °C in 2 mL Eppendorf centrifuge tubes for 2 h (Jäger et al. 2010). After 2 h of incubation, the samples were heated at 95 °C for 5 min to stop the hydrolysis. The samples were centrifuged at 15,000 rpm for 5 min to separate the supernatant. The reducing sugar concentration in the supernatant was determined using dinitrosalicylic acid (DNS) assay and using glucose as standard.
Enzymatic hydrolysis
The hydrolysis experiments were conducted at 25 g/L cellulose (filter paper and cotton balls) concentration and various enzyme loadings (5, 10, and 15 mg/g cellulose) in 50 mM sodium acetate, pH 5.0, 10 mL total volume in 25 mL Erlenmeyer flasks closed with rubber stopper. 100 µL of 2% sodium azide was added in each flask to avoid microbial contamination. The experiments were carried out in controlled environment incubator shaker set at 45 °C and 125 rpm. 200 µL of sample was withdrawn at 3, 6, 9, 12, 18, 24, 36, 48, and 72 h to determine sugar concentrations and the hydrolysis profile. The samples were heated at 95 °C for 5 min to stop the reaction and were prepared for high-performance liquid chromatography (HPLC) analysis. All experiments were performed in triplicate.