Skip to main content

Prediction of phenolic compounds and glucose content from dilute inorganic acid pretreatment of lignocellulosic biomass using artificial neural network modeling


Dilute inorganic acids hydrolysis is one of the most promising pretreatment strategies with high recovery of fermentable sugars and low cost for sustainable production of biofuels and chemicals from lignocellulosic biomass. The diverse phenolics derived from lignin degradation during pretreatment are the main inhibitors for enzymatic hydrolysis and fermentation. However, the content features of derived phenolics and produced glucose under different conditions are still unclear due to the highly non-linear characteristic of biomass pretreatment. Here, an artificial neural network (ANN) model was developed for simultaneous prediction of the derived phenolic contents (CPhe) and glucose yield (CGlc) in corn stover hydrolysate before microbial fermentation by integrating dilute acid pretreatment and enzymatic hydrolysis. Six processing parameters including inorganic acid concentration (CIA), pretreatment temperature (T), residence time (t), solid-to-liquid ratio (RSL), kinds of inorganic acids (kIA), and enzyme loading dosage (E) were used as input variables. The CPhe and CGlc were set as the two output variables. An optimized topology structure of 6–12-2 in the ANN model was determined by comparing root means square errors, which has a better prediction efficiency for CPhe (R2 = 0.904) and CGlc (R2 = 0.906). Additionally, the relative importance of six input variables on CPhe and CGlc was firstly calculated by the Garson equation with net weight matrixes. The results indicated that CIA had strong effects (22%-23%) on CPhe or CGlc, then followed by E and T. In conclusion, the findings provide new insights into the sustainable development and inverse optimization of biorefinery process from ANN modeling perspectives.

Graphical Abstract


Nowadays, the concerns over climate change, especially the increasing greenhouse gases (GHGs) emissions, have necessitated a rethinking of traditional methods for fuels production (Field et al. 2020; Solarte-Toro et al. 2019). According to the statistics, about 25% of the total GHGs emission was contributed by transportation (Keasling et al. 2021). Reducing GHGs emissions and enhancing carbon capture/sequestration (CCS) are global concerns (Ishaq et al. 2021). To this end, one avenue is to produce advanced transportation fuels from atmospheric carbon resources (mainly CO2) or renewable biomass grown by fixing CO2 to partially replace fossil resources (Liu et al. 2020).

In the past decades, utilization of lignocellulosic biomass from non-food crops for production of fuels and fine chemicals has gained much attention (Luo et al. 2021a; Rajan et al. 2020) because their net carbon footprint is neutral. Lignin (15–30 wt%), cellulose (30–50 wt%), and hemicellulose (20–35 wt%) are three main components in lignocellulose (Schutyser et al. 2018). The structure of lignocellulose is complex since cellulose and hemicellulose are enwrapped by lignin, and hemicellulose is also interlaced with cellulose fibers (Liu et al. 2021), resulting in the biomass recalcitrance and low enzymatic efficiency for hydrolysis. Depolymerization of lignocellulose to obtain fermentable sugars (mainly glucose) is the key for production of fuels (Luo et al. 2018). Thus, decreasing biomass recalcitrance and structural complexity via efficient pretreatments is generally required (Liu et al. 2021).

To disrupt the close inter-component association between cellulose, hemicellulose, and lignin, various pretreatment strategies including acid-, alkaline-, ionic liquid-, and organic solvent-based methods have been developed (Hijosa-Valsero et al. 2017; Jönsson and Martín 2016; Xia et al. 2021). Among those pretreatment strategies, dilute inorganic acids hydrolysis is one of the most promising methods with high recovery of fermentable sugars and low cost (Jönsson and Martín 2016), which is beneficial for the production of biofuels and chemicals. Nevertheless, during acid pretreatment of biomass, hemicellulose and lignin are partially solubilized, which result in the degradation of these fragments under an acidic environment (Zhang et al. 2021). Generally, three kinds of inhibitors, including weak organic acids (acetic acid, formic acid, etc.), furan derivatives, and phenolics (phenolic acids, and phenolic aldehydes) are derived during pretreatment, which affect enzymatic and fermentation efficiency (Chen et al. 2020; Yao et al. 2021). Especially, the phenolic compounds derived from lignin degradation during pretreatment characterized by complex structure, diversity, low water solubility, and low hydrophobicity were reported as the main limiting factor to the industrial biofuel production (Gu et al. 2019).

To counteract the toxic effect of phenolics on the enzymatic and fermentation process, detoxification of lignocellulosic hydrolysates and slurries with overliming, activated carbon, or water washing is widely implemented (Sivagurunathan et al. 2017). However, fermentable sugars were also partially removed, and the treatment of generated wastewater would further deteriorate techno-economic performance. Additionally, the construction of robust strains by elucidating the response mechanism to phenolics could also weaken the inhibitory effect (Kumar et al. 2020; Luo et al. 2021b; Luo et al. 2020). For example, Jiménez-Bonilla et al. (2020) reported that overexpressing efflux pump gene srpB from Pseudomonas putida is beneficial to improve the tolerance of Clostridium saccharoperbutylacetonicum to 1.2 g/L ferulic acid (Jiménez-Bonilla et al. 2020). Although systems metabolic engineering and adaptive evolution strategies could improve the robustness of microbes under biomass-derived inhibitors stress, the derived concentration feature of inhibitors in biomass hydrolysate under pretreatment process should be firstly considered.

Optimization of pretreatment conditions by evaluating glucose hydrolysis efficiency and derived phenolics was carried out using different lignocellulose such as rice straw (Lee et al. 2012), sugarcane bagasse (Lv et al. 2017), etc. The content of lignin-derived phenolics in hydrolysate before fermentation is mainly attributed to pretreatment conditions, such as biomass species, pretreatment temperature, reaction time, solid-to-liquid ratio, etc. (Bhatia et al. 2020; Jönsson and Martín 2016), and also changed during enzymatic hydrolysis (Yao et al. 2021). Implementing numerous experiments for pretreatment of biomass could achieve a sub-optimal result, it would unavoidably increase the operation complexity and time-consuming. Loading of high-cost cellulase also largely improves the total biorefinery cost. Furthermore, the relationship between pretreatment, enzymatic conditions, and derived features of phenolics content was still unclear. Hence, it remains challenging to systematically analyze the effects on derived phenolics and glucose yield concerning both biomass characteristics, pretreatment, and enzymatic conditions.

Development of bioprocess modeling for the non-linear lignocellulosic bioprocessing is an efficient strategy enabling the success of biorefinery and bio-based circular economy (Unrean 2016). Recently, artificial intelligence (AI) technology, mainly machine learning (ML) algorithms, is competent for predicting/confirming relative importance between input and output variables (Li et al. 2021). The effectiveness of ML methods for predicting pyrolytic gas yield and compositions was verified (Tang et al. 2021), which could benefit to better understand biomass pyrolysis and syngas upgrading. An artificial neural network (ANN) model with a multilayer architecture (3-15-1) was optimized and predicted the biogas production curve from cattle under mesophilic and thermophilic conditions (Ghatak and Ghatak 2018). An ANN model was built to predict sugar yields of pretreated rice straw during hydrolysis by considering three factors of biomass loadings, particle size, and reaction time (Vani et al. 2015). Recently, Moodley et al. (2019) found that sugar yield from sugarcane leaf waste was sensitive to the alkali and salt concentrations by establishing two ANN tools in inorganic salt pretreatment process (Moodley et al. 2019). The above reports clearly show that ANN can be trained with experimental data to generate efficient models of non-linear multivariate processes. To the best of our knowledge, prediction of lignin-derived phenolics content and glucose concentration from inorganic acid pretreatment of biomass with advanced modeling technology was not reported thus far.

Focusing on above-mentioned issues, in this study, we aim to develop an ANN model for elucidating the derived feature of phenolics from corn stover by integrative investigating typical three inorganic acids (HCl, H2SO4, and H3PO4) pretreatment and enzymatic hydrolysis processes. Furthermore, the relative importance of pretreatment conditions (i.e., input variables) on phenolic and glucose concentrations (i.e., output variables) was also first elucidated by considering the neural net weights in the developed ANN model. The results would provide new insights into the biorefinery process for biofuels production.

Materials and methods

Materials and chemicals

The corn stover was collected from Lianyungang City, China. It was firstly cut and sieved to a particle size of ~ 0.4 mm. The fine corn stover was then dried in an oven (GZX-9140MBE, Shanghai Boxun Medical Biological Instrument Corp., China) at 60 °C for 12 h to remove the moisture and stored in plastic bags at 4 °C. Three kinds of inorganic acids, including dibasic acid (hydrochloric acid, 37 wt%), binary acid (sulfuric acid, 98 wt%), and ternary acid (phosphoric acid, 85 wt%) were used as pretreatment agents for corn stover depolymerization. The inorganic acids were purchased from Sinopharm Chemical Reagent Co., Ltd. All of the chemicals were used as received without other specified purification. A commercial cellulase Cellic CTec2 (enzyme blend, SAE0020-50 mL solution) was obtained from Sigma-Aldrich (St. Louis, MO, USA), and used for hydrolyzing the pretreated corn stover to obtain glucose.

Dilute inorganic acid pretreatment and enzymatic hydrolysis

To systematically investigate the effects of pretreatment conditions on the derived phenolic compounds and glucose hydrolysis yield, six key parameters including inorganic acid concentration (CIA), pretreatment temperature (T), residence time (t), solid-to-liquid ratio (RSL), kinds of inorganic acids (kIA), and enzyme loading dosage (E) were considered in this study. The acid pretreatment process of corn stover was carried out in a 250-mL vertical reactor (TGYF-B, Gongyi Yuhua Instrument Co., Ltd., China) with an electrically magnetic stirrer and a temperature controller. Firstly, the dried corn stover, inorganic acid, and 150 mL water were added into the reactor simultaneously. The experimental parameter ranges for corn stover pretreatment, and enzymatic hydrolysis were carefully designed and also summarized in Table 1, containing the raw data for training, validation, and testing the following ANN model. After acid hydrolysis, the pH of the pretreated mixture was regulated to 5.0 by 8 mol/L NaOH solution. The overall effects of biomass-derived phenolic compounds on enzymatic hydrolysis were also considered due to the interaction of phenolics with cellulase. Thus, the pretreated mixture was not filtrated, and directly used for enzymatic hydrolysis by adding cellulase with 10–20 FPU/g corn stover (Table 1). The reaction was conducted at 50 °C in a water bath at 150 rpm. After 72 h enzymatic hydrolysis, the pretreated mixture was firstly boiled at 95–100 °C for 5 min to terminate the reaction. Then, the liquid fraction of the mixture (i.e., hydrolysate) was separated by vacuum filtration for determining phenolics and glucose contents.

Table 1 The design of operating conditions to perform the dilute acid pretreatment of lignocellulosic biomass for the development of ANN model

Development of ANN model

Selection of input variables and experimental data

Different kinds of biomass (corn stover, rice straw, switchgrass, sugarcane straw, etc.) have different ratios of cellulose, hemicellulose, and lignin, which result in various features of the derived phenolics and glucose yield even under the same pretreatment and enzymatic hydrolysis conditions (Pratto et al. 2020; Solarte-Toro et al. 2019). Among these lignocellulosic biomasses, corn stover is the largest crop residue in China (Yang et al. 2020b); thus, it was selected as the model biomass for the pretreatment experiments in this study.

The formation of the biomass-derived phenolic compounds content and glucose hydrolysis yield in hydrolysate is a complex and non-linear process, which is difficult to directly predict the derived features with traditional constructive mathematical models. Thus, we tried to use ANN modeling to predict derived phenolic compounds content and glucose yield after pretreatment and enzymatic hydrolysis processes. As described in above section, six key parameters of CIA, T, t, RSL, kIA, and E (Table 1) were considered as the input variables for the development of ANN model. The output variables were glucose concentration (CGlc) and phenolic content (CPhe) in biomass hydrolysate after 72 h enzymatic hydrolysis. To better perform the ANN model, 77 runs experiments were carried out according to the design of input variables shown in Table 1.

Data preprocessing, and the topology of ANN model

Figure 1A shows the step-by-step scheme of ANN model development to predict CPhe and CGlc from corn stover under different operation conditions. The ANN model was developed by using Matlab R2019a (The MathWorks, Inc., USA). It has one multiple layer neural network with interconnected neurons arranged in three layers of input, hidden, and output layers (Fig. 1B). The ANN model proposed in present study consists in: (1) two parameters of CGlc and CPhe were considered the output variables; (2) six variables of CIA, T, t, RSL, kIA, and E in one input layer were fully connected to the hidden layer; (3) one hidden layer had n neurons; and (4) bias of b1,j (the bias of inputs, j = 1, 2, 3, …, n) and b2,k (the bias of output layer, k = 1, 2) were used for training the network.

Fig. 1
figure 1

The flow diagram of the development of ANN model (A) and the topology structure of ANN model (B) to predict the glucose concentration (CGlc) and total phenolic content (CPhe) in biomass hydrolysate after dilute inorganic acid pretreatment and 72 h enzymatic processes. A NP, network performance; TP, target performance of network; NN, neural network. B b1,j, the bias of inputs (j = 1, 2, 3, …, n); b2,k, the bias of output layer (k = 1, 2)

Firstly, the whole datasets were split as training, validation, and testing groups with a ratio of 75:15:10. The input dataset is normalized in the range of [− 1 1] before training the network to obtain an accurate model. The inbuilt ‘mapminmax’ function is used for the normalization of experimental data, which is equivalent to Eq. (1):

$$V^{\prime} = \frac{{V - V_{\min } }}{{V_{\max } - V_{\min } }}\left( {V^{\prime}_{\max } - V^{\prime}_{\min } } \right) + V^{\prime}_{\min } ,$$

where V′, V, Vmax, Vmin, Vmax, and Vmin represented the new value, the original value, the original maximum limit, original minimum limit, the new maximum limit (i.e., 1), and the new minimum limit (i.e., − 1), respectively.

The ANN model was trained by the Adam optimizer, with a learning rate of 0.001, and training batches of size 2. The neurons (n) in the hidden layer were determined by an empirical equation Eq. (2) (Yang et al. 2020a). The root means square error (RMSE) obtained from different neurons and iterations were used to evaluate the accuracy of model predictions, which was calculated by Eq. (3). Based on RMSE results, an optimized ANN model with better performance was developed to predict CGlc and CPhe (Fig. 1A):

$$n = \sqrt {i + k} + \alpha ,$$
$${\text{RMSE}} = \sqrt {\frac{1}{m}\sum\limits_{h = 1}^{m} {\left( {y_{{{\text{pre}}}}^{(h)} - \hat{y}_{\exp }^{(h)} } \right)} } ,$$

where n is the number of neurons in hidden layer; i is the number of input variables; k is the number of output variables; α is a constant range of 1–10; ypre(h) is the predicted output value of CPhe; \({\hat{\text{y}}}\) exp(h) is the experimental value of the output variable of CPhe; and m is the number of the samples for training, validation, or testing of ANN models.

Analysis of relative importance of input variables

The parameters of IWj,i, LWk,j, b1,j, and b2,k in the developed ANN model could be used to simulate the output variables (CGlc, CPhe). In addition, to evaluate the relative importance of the input variables on the two output variables, the process was based on the neural net weight matrixes (IWj,i, and LWk,j, Fig. 1B) and Garson equation (Puig-Arnavat et al. 2013; Sunphorka et al. 2017). Garson equation was based on the partitioning of connection weights in the network. The numerator presents the total of absolute products of weights for each input (i = 1, 2…6) while the denominator represents the total of the absolute values of all weights feeding into the hidden layer (j = 1, 2…n). The Garson equation is presented in Eq. (4) for adapting the ANN topology:

$${I_i} = \frac{{\sum\limits_{j = 1}^n {\left( {\left( {\left| {I{W_{j,i}}} \right| \div \sum\limits_{i = 1}^{i = 6} {\left| {I{W_{j,i}}} \right|} } \right) \times \left| {L{W_{k,j}}} \right|} \right)} }}{{\sum\limits_{i = 1}^{i = 6} {\left\{ {\sum\limits_{j = 1}^n {\left( {\left( {\left| {I{W_{j,i}}} \right| \div \sum\limits_{i = 1}^{i = 6} {\left| {I{W_{j,i}}} \right|} } \right) \times \left| {L{W_{k,j}}} \right|} \right)} } \right\}} }} \times 100\% ,$$

where Ii is the relative importance of the ith input variable on output variables of CGlc, and CPhe; IWj,i is the neural net weight to jth neuron of the hidden layer from ith input variable; and LWk,j is the weight to kth output variable from jth neuron of the hidden layer, respectively.

Analytical methods

The glucose concentration in corn stover hydrolysate (CGlc) was determined by a biosensor analyzer (S-10, Sieman Technology, China) (Luo et al. 2019). Determination of total phenolic content (CPhe) in the hydrolysate was based on the Folin–Ciocalteu assay with gallic acid as the standard (Xu et al. 2021) with some modifications. Briefly, 0.4 mL sample was mixed with 2.6 mL water and 0.5 mL Folin–Ciocalteu reagent (1.0 mol/L, Sinopharm Chemical Reagent Co., Ltd., Shanghai). After 5 min, 5.0 mL water and 1.5 mL Na2CO3 solution (20%, w/v) was added simultaneously. Then, the mixture oscillated under a dark environment at 40 °C for 1 h. Finally, the absorbance of the reaction mixture was analyzed at 760 nm by a UV–Vis spectrophotometer (UV-2100, Unico Instrument Co., Ltd., China). As a result, the CPhe (g/L) was calculated by Eq. (5):

$$C_{{{\text{Phe}}}} = \frac{{a \times A_{760} }}{{V_{s} }} \times N,$$

where a is the linear coefficient of standard curve; A760 is the absorbance of reaction mixture at 760 nm; VS is the volume of reaction mixture; and N is the dilution ratio, respectively.

Statistical analysis

The experimental data of CPhe and CGlc for development of ANN model are represented as the mean ± standard deviation (SD) of three independent experiments. Significant differences were confirmed with a two-tailed Student’s t-test aided by Microsoft Excel 2016.


Effects of inorganic acid pretreatment/enzymatic hydrolysis on the content of derived phenolics and glucose

For efficient production of biofuels and fine chemicals from lignocellulosic biomass via microbial fermentation, an optimized pretreatment process featured with a high glucose yield from feedstock and a low derived concentration of inhibitors is crucial (Liu et al. 2021). Inorganic acid-based pretreatment is applied to efficiently solubilize hemicellulose from lignocellulose and it also improves the cellulose digestibility (Jönsson and Martín 2016; Zabed et al. 2016). Therefore, dilute inorganic acids hydrolysis is widely used in biorefinery process. In this study, focusing on dilute inorganic acid pretreatment of corn stover, the overall effects of pretreatment conditions on phenolics and glucose concentration after 72 h enzymatic hydrolysis were studied, and the results are shown in Fig. 2. Three typical inorganic acids, including HCl (dibasic acid), H2SO4 (binary acid), and H3PO4 (ternary acid) were used as the pretreatment reagent. It should be noted, in these cases, the solid-to-liquid ratio (RSL), and cellulase loading dosage (E) was kept at 10% and 20 FPU/g corn stover, respectively.

Fig. 2
figure 2

Effects of dilute inorganic acid pretreatment of corn stover on the glucose concentration (CGlc) and derived phenolic concentration (CPhe) in hydrolysate after 72 h enzymatic hydrolysis. The typical 15 batch experiments results were selected from Table S1 with same values of the ratio of solid to liquid (RSL, 0.10) and cellulase loading dosage (E, 20 FPU/g corn stover)

When pretreating corn stover with 0.05 mol/L of HCl under 160 °C for 60 min, the derived phenolic compounds concentration in hydrolysate (CPhe) was 1.13 g/L, and glucose content (CGlc) reached 22.8 g/L. Under the same condition of H2SO4 pretreatment, CGlc were increased by 12.3% (25.6 g/L), and CPhe was also elevated to 1.34 g/L with significant differences (p < 0.05) to the HCl pretreatment process. Whereas, in the case of 0.05 mol/L H3PO4 pretreatment with 160 °C for 60 min, the glucose yield was only 20.3 g/L with a higher level of CPhe (1.97 g/L, Fig. 2). The results indicated that cellulase might tolerate a low concentration of CPhe. For the HCl pretreatment of corn stover, CPhe was increased to 1.64 g/L when elevating acid concentration from 0.05 to 0.1 mol/L, but the phenomenon was not found in H2SO4 and H3PO4 pretreatment processes. In addition, the effects of different RSL and E on CGlc and CPhe were also studied and the detailed experimental data are also shown in Additional file 1: Table S1. Since glucose and derived phenolic contents are mainly attributed to complex pretreatment conditions and enzymatic process with multivariate non-linear features (Huang et al. 2007), it is still challenging to speculate the changing patterns of CGlc and CPhe by only implementing numerous experiments. Hence, it is necessary to explore advanced methods for elucidating the relative importance of operational conditions on CGlc or CPhe for further optimizing the lignocellulose pretreatment process.

Optimization and determination of key parameters in ANN model

As shown in Fig. 1B, the multilayer ANN model for the prediction of CGlc and CPhe consisted of 6 neurons in one input layer, one hidden layer, and one output layer. Since the number of neurons in the hidden layer is a key parameter in ANN model, and trial–error approach was applied to ensure a relatively fast and good convergence of RMSE (Puig-Arnavat et al. 2013). The obtained results of 11 ANN models under different neuron numbers (n) with 500 iterations are exhibited in Table 2. Here, an empirical equation Eq. (2) was used to determine the range of neuron numbers (from 3 to 13) in the hidden layer (Yang et al. 2020a). As shown in Table 2, when the neurons in the hidden layer was 3, RMSE for CGlc reached 7.16 in the training dataset and 7.12 in the validation dataset. For the RMSE of CPhe, a range of 0.41–0.54 was obtained in the case of 3 neurons. When changing the neurons from 3 to 13 in the hidden layer, RMSE varied due to the different parameters while developing ANN models. By combination analysis of the RMSE for CGlc and CPhe, the network performance of ANN model with 12 neurons was better which showed relative lower values of RMSE (Table 2). In addition, the effects of iterations on RMSE for CPhe when training and validation of the ANN models were also investigated. The RMSE changing patterns in Fig. 3 indicated that increasing iterations from 400 to 800 could effectively decrease the RMSE of CPhe and then kept stable after 800 iterations. Therefore, an optimized iteration of 800 with the best network structure of 6-12-2 was considered.

Table 2 Computational results of RMSE during the training and validation processes with different neurons of hidden layer in ANN models. RMSE is the root means square error calculated by Eq. (3)
Fig. 3
figure 3

The changing patterns of RMSE (CPhe) during training and validation with 1000 iterations for the development of ANN models. The subplot is the enlarged visualization of RMSE under 400–900 iterations

Training and testing of the ANN model

The best ANN structure was achieved after numerical experiments using training and validation datasets with a single hidden layer consisted of n = 12 neurons (Fig. 1B; Table 2). During the training process with 57 batches dataset (each including 6 input variables and 2 output variables, Additional file 1: Table S1), the predicted values of CGlc and CPhe are exhibited in Fig. 4. For the predicted values of CGlc during training process, the RMSE was 5.77 (Fig. 4A), and the corresponding result for CPhe reached a lower value of 0.44 (Fig. 4B). In addition, the optimized weights (IWj,i, and LWk,j) and biases (b1,j, and b2,k) of the proposed ANN model are listed in Table 3.

Fig. 4
figure 4

Predicted and experimental values of CGlc (A) and CPhe (B) during training ANN model. The experimental values of CGlc and CPhe were the means ± SD obtained from three replicates. The input values for training ANN model were the means from three experiments, and thus no error bars in the predicted values CGlc and CPhe in A and B

Table 3 Weights and biases of the hidden and output layers used in the developed ANN model for the prediction of CGlc and CPhe

To verify the effectiveness of the proposed ANN model, a testing dataset with suitable ranges of input/output variables was also used to predict CGlc and CPhe under different dilute inorganic acid pretreatment conditions. The fitting relationships between the predicted and experimental values of CGlc and CPhe are plotted in Fig. 5. The slope of the fitting curve and the correlation coefficient of R2 value are the two key parameters for accurately evaluation of the proposed ANN model. In other words, a better fitting relationship between predicted values obtained from the developed model and experimental values generally featured with the slope and R2 nearing to 1.0. As shown in Fig. 5, the diagonal of the plot (Y = X) was displayed with a slope of 1.0 for clearly comparison. It is indicated that the R2 of CGlc fitting curve was 0.906 under the range of 7.5–25 g/L, with a slope of 0.86. In addition, the experimental values of CPhe were located at 0.8–3.0 g/L. The ANN model obtained the fitting curve with a slope of 0.82 and a R2 of 0.904 for prediction of CPhe. Based on the fitting performance shown in Fig. 5, it is concluded that ANN modeling is an efficient tool for predicting CGlc and CPhe simultaneously from the non-linearity and complexity of the input–output system containing corn stover pretreatment/enzymatic processes.

Fig. 5
figure 5

Experimental validation and fitting relationship of CGlc and CPhe based on the developed ANN model. CExpGlc, experimental values of CGlc; CExpPhe, experimental values of CPhe; CPreGlc, predicted values of CGlc obtained by the developed ANN model; CPrePhe, predicted values of CPhe obtained by the developed ANN model. The experimental values of CGlc and CPhe are presented as the means ± SD (n = 3) in Fig. 5

Relative importance of input parameters on output variables

For the pretreatment of lignocellulosic biomass, elucidating and understanding the influence of pretreatment or enzymatic conditions on glucose hydrolysis yield and inhibitors formation are beneficial for guiding the biorefinery process (Bhatia et al. 2020). Although numerous studies have investigated the effects of pretreatment and enzymatic methods on glucose yield and inhibitors formation (Hassan et al. 2018; Jönsson and Martín 2016), the quantitative relationship properties between operation variables and those derived contents in biomass hydrolysate before fermentation are still unclear.

Focusing on this concern, the relative importance (Ii, i = 1, 2…6) of the six input variables on the two output variables of CGlc and CPhe were analyzed (Fig. 6). The Ii was calculated by the Garson equation Eq. (4) with the weight matrixes of IWj,i and LWk,j in the developed ANN model, which are listed in Table 3. As shown in Fig. 6A, the six input variables strongly influenced CGlc with the range of 12–23%. The five parameters included in the pretreatment process (CIA, T, t, RSL, and kIA) represent up to 79% importance, and enzyme dosage (E) accounts for 21% importance on CGlc. Interestingly, the highest importance (I1 = 23%) on CGlc is the concentration of inorganic acid (CIA), which is even higher than that of enzyme dosage (21%). The results revealed that an efficient pretreatment strategy is beneficial for glucose hydrolysis, which is mainly attributed to the improved accessibility of cellulose during pretreatment process (Siqueira et al. 2017; Xu et al. 2016). The relative importance of T, RSL, and kIA on CGlc is around 14%-16%, while in this case, the I3 (t) is the lowest index (Fig. 6A). In addition, the relative importance of input variables on CPhe is plotted in Fig. 6B. Similarly, inorganic acid concentration (CIA) also has the strongest importance of 22% on CPhe. It is indicated that a severe acidic environment with a higher concentration of H+ could improve the efficiency of lignin deconstruction coupling with phenolics formation (He et al. 2020). The other four variables in the pretreatment process occupied 60% importance on CPhe (i.e., T for 19%, t for 11%, RSL for 17%, and kIA for 13%, Fig. 6B). It should be noted that enzyme dosage (E) still kept higher importance of 18% on CPhe (Fig. 6B), with a relatively lower importance of 21% on CGlc (Fig. 6A). The obtained results of the relative importance of enzyme dosage on CPhe reflected that the derived phenolic contents would change in biomass hydrolysate during enzymatic hydrolysis.

Fig. 6
figure 6

Relative importance of six input parameters on CGlc (A) and CPhe (B), which calculated by Eq. (4) with the detailed weights of the developed ANN model shown in Table 3


Optimization of biorefinery process can lead to highly efficient production of biofuels from renewable resources such as lignocellulosic biomass. Traditional optimization method called “one variable at time” (OVAT) is time-consuming and requires a large number of experiments. To circumvent the limitation, experimental models are used to elucidate the relationship between operating parameters and final outcomes. Recently, various models such as response surface methodology (RSM) and ANN were reported (Das et al. 2015; Fernandes et al. 2020; Sewsynker-Sukai and Gueguim Kana 2018). The obstacle of RSM is the limitation to the hypothesis of quadratic correlation between conditions because it assumes the second-order polynomial equation (Fernandes et al. 2020). Therefore, ANN modeling was used to build efficient models to predict CPhe and CGlc.

Some ANN models were recently developed to assess the biomass pretreatment for the production of biofuels and fine chemicals (Moodley et al. 2019; Sunphorka et al. 2017; Vani et al. 2015). Compared with those models, the proposed ANN model in this study possessed three advantages. Firstly, the derived feature of the diversity of phenolic compound content from dilute inorganic acid pretreatment of lignocellulosic biomass, which is one of the most promising approaches to industrial implementation, was clarified when using corn stover as the feedstock for the first time. Secondly, the dilute inorganic acid pretreatment and enzymatic hydrolysis processes were considered from a systematic perspective. It is because CPhe and CGlc in biomass hydrolysate are the key factors affecting the overall fermentation performance. Lastly, the relationship of relative importance between operation variables and output variables (CPhe and CGlc) was clearly elucidated by the weight matrixes in the developed ANN model. It should be noted that although ANN models could not accurately calculate/predict results beyond the range of operational parameters in training/validation datasets, it can still provide an estimation of parameters in an uncharted workspace by catching the trends during the training process (Rashid et al. 2021). Collection of previously reported data for the development of advanced ANN models would be an efficient strategy. Therefore, it is concluded that ANN modeling is a powerful tool for predicting key parameters in some crucial multivariate non-linear bioprocesses. The relative importance analysis also provides new insights into the biochemical process assessment and optimization.

Although the phenolic compounds were directly formed by lignin degradation during pretreatment process, some studies found that the content (CPhe) still changes during the enzymatic hydrolysis process. It is mainly attributed to the interactions effect between cellulase and lignin-derived phenolics (Yao et al. 2021; Zhao et al. 2021). In addition, the water-soluble lignin-derived phenolics were adsorbed by cellulase and inhibit the enzymatic efficiency (Yuan et al. 2021). Those findings could explain the result that the enzyme dosage has a high importance of 18% on CPhe (Fig. 6B).

Although dilute inorganic acid pretreatment was selected to elucidate the derived feature of phenolic content from corn stover, some other indispensable factors should also be considered to further improve the sustainability of biorefinery process. Different pretreatment methods have various solubilization abilities of lignin, cellulose, and hemicellulose. For the removal efficiency of lignin, the alkaline-based hydrolysis is generally higher than that of dilute acid-based methods (Zabed et al. 2016). If the aim is only to investigate lignin removal efficiency from lignocellulosic biomass, alkaline-based pretreatment might be a better choice. In addition, the operation parameters including the particle size and biomass species also influenced the fermentable sugars and derived phenolic contents (Vani et al. 2015), which should be explored in the future.


An artificial neural network (ANN) model was developed to simultaneously predict the derived feature of phenolic compounds content (CPhe) and glucose yield (CGlc) in biomass hydrolysate from dilute inorganic acid pretreatment and enzymatic hydrolysis. Five pretreatment and enzyme dosage parameters were used as the input variables in the optimized ANN model, which has one hidden layer with 12 neurons. Results indicated that the developed ANN model has a good fitting performance (R2 > 0.90) for the prediction of CPhe and CGlc. The relative importance of six variables on CPhe and CGlc was also calculated to provide new insights for optimizing biorefinery to produce biofuels.

Availability of data and materials

The data supporting the conclusions of this article are included in the main manuscript.


  1. Bhatia SK, Jagtap SS, Bedekar AA, Bhatia RK, Patel AK, Pant D, Rajesh Banu J, Rao CV, Kim Y-G, Yang Y-H (2020) Recent developments in pretreatment technologies on lignocellulosic biomass: Effect of key parameters, technological improvements, and challenges. Bioresour Technol 300:122724

    CAS  PubMed  Google Scholar 

  2. Chen X, Zhai R, Li Y, Yuan X, Liu Z-H, Jin M (2020) Understanding the structural characteristics of water-soluble phenolic compounds from four pretreatments of corn stover and their inhibitory effects on enzymatic hydrolysis and fermentation. Biotechnol Biofuels 13:44

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Das S, Bhattacharya A, Haldar S, Ganguly A, Gu S, Ting YP, Chatterjee PK (2015) Optimization of enzymatic saccharification of water hyacinth biomass for bio-ethanol: comparison between artificial neural network and response surface methodology. Sustain Mater Techno 3:17–28

    Google Scholar 

  4. Fernandes CD, Nascimento VRS, Meneses DB, Vilar DS, Torres NH, Leite MS, Vega Baudrit JR, Bilal M, Iqbal HMN, Bharagava RN, Egues SM, Romanholo Ferreira LF (2020) Fungal biosynthesis of lignin-modifying enzymes from pulp wash and Luffa cylindrica for azo dye RB5 biodecolorization using modeling by response surface methodology and artificial neural network. J Hazard Mater 399:123094

    CAS  PubMed  Google Scholar 

  5. Field JL, Richard TL, Smithwick EAH, Cai H, Laser MS, LeBauer DS, Long SP, Paustian K, Qin Z, Sheehan JJ, Smith P, Wang MQ, Lynd LR (2020) Robust paths to net greenhouse gas mitigation and negative emissions via advanced biofuels. Proc Natl Acad Sci USA 117:21968–21977

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Ghatak MD, Ghatak A (2018) Artificial neural network model to predict behavior of biogas production curve from mixed lignocellulosic co-substrates. Fuel 232:178–189

    CAS  Google Scholar 

  7. Gu H, Zhu Y, Peng Y, Liang X, Liu X, Shao L, Xu Y, Xu Z, Liu R, Li J (2019) Physiological mechanism of improved tolerance of Saccharomyces cerevisiae to lignin-derived phenolic acids in lignocellulosic ethanol fermentation by short-term adaptation. Biotechnol Biofuels 12:268

    PubMed  PubMed Central  Google Scholar 

  8. Hassan SS, Williams GA, Jaiswal AK (2018) Emerging technologies for the pretreatment of lignocellulosic biomass. Bioresour Technol 262:310–318

    CAS  PubMed  Google Scholar 

  9. He J, Huang C, Lai C, Huang C, Li M, Pu Y, Ragauskas AJ, Yong Q (2020) The effect of lignin degradation products on the generation of pseudo-lignin during dilute acid pretreatment. Ind Crop Prod 146:112205

    CAS  Google Scholar 

  10. Hijosa-Valsero M, Paniagua-Garcia AI, Diez-Antolinez R (2017) Biobutanol production from apple pomace: the importance of pretreatment methods on the fermentability of lignocellulosic agro-food wastes. Appl Microbiol Biotechnol 101:8041–8052

    CAS  PubMed  Google Scholar 

  11. Huang J, Mei LH, Xia J (2007) Application of artificial neural network coupling particle swarm optimization algorithm to biocatalytic production of GABA. Biotechnol Bioeng 96:924–931

    CAS  PubMed  Google Scholar 

  12. Ishaq H, Ali U, Sher F, Anus M, Imran M (2021) Process analysis of improved process modifications for ammonia-based post-combustion CO2 capture. J Environ Chem Eng 9:104928

    CAS  Google Scholar 

  13. Jiménez-Bonilla P, Zhang J, Wang Y, Blersch D, de Bashan L-E, Guo L, Wang Y (2020) Enhancing the tolerance of Clostridium saccharoperbutylacetonicum to lignocellulosic-biomass-derived inhibitors for efficient biobutanol production by overexpressing efflux pumps genes from Pseudomonas putida. Bioresour Technol 312:123532

    PubMed  Google Scholar 

  14. Jönsson LJ, Martín C (2016) Pretreatment of lignocellulose: formation of inhibitory by-products and strategies for minimizing their effects. Bioresour Technol 199:103–112

    PubMed  Google Scholar 

  15. Keasling J, Garcia Martin H, Lee TS, Mukhopadhyay A, Singer SW, Sundstrom E (2021) Microbial production of advanced biofuels. Nat Rev Microbiol 19:701–715

    CAS  PubMed  Google Scholar 

  16. Kumar V, Yadav SK, Kumar J, Ahluwalia V (2020) A critical review on current strategies and trends employed for removal of inhibitors and toxic materials generated during biomass pretreatment. Bioresour Technol 299:122633

    CAS  PubMed  Google Scholar 

  17. Lee K-M, Kalyani D, Tiwari MK, Kim T-S, Dhiman SS, Lee J-K, Kim I-W (2012) Enhanced enzymatic hydrolysis of rice straw by removal of phenolic compounds using a novel laccase from yeast Yarrowia lipolytica. Bioresour Technol 123:636–645

    CAS  PubMed  Google Scholar 

  18. Li J, Zhang W, Liu T, Yang L, Li H, Peng H, Jiang S, Wang X, Leng L (2021) Machine learning aided bio-oil production with high energy recovery and low nitrogen content from hydrothermal liquefaction of biomass with experiment verification. Chem Eng J 425:130649

    CAS  Google Scholar 

  19. Liu Z, Wang K, Chen Y, Tan T, Nielsen J (2020) Third-generation biorefineries as the means to produce fuels and chemicals from CO2. Nat Catal 3:274–288

    CAS  Google Scholar 

  20. Liu Y, Cruz-Morales P, Zargar A, Belcher MS, Pang B, Englund E, Dan Q, Yin K, Keasling JD (2021) Biofuels for a sustainable future. Cell 184:1636–1647

    CAS  PubMed  Google Scholar 

  21. Luo H, Yang R, Zhao Y, Wang Z, Liu Z, Huang M, Zeng Q (2018) Recent advances and strategies in process and strain engineering for the production of butyric acid by microbial fermentation. Bioresour Technol 253:343–354

    CAS  PubMed  Google Scholar 

  22. Luo H, Zheng P, Xie F, Yang R, Liu L, Han S, Zhao Y, Bilal M (2019) Co-production of solvents and organic acids in butanol fermentation by Clostridium acetobutylicum in the presence of lignin-derived phenolics. RSC Adv 9:6919–6927

    CAS  Google Scholar 

  23. Luo H, Zheng P, Bilal M, Xie F, Zeng Q, Zhu C, Yang R, Wang Z (2020) Efficient bio-butanol production from lignocellulosic waste by elucidating the mechanisms of Clostridium acetobutylicum response to phenolic inhibitors. Sci Total Environ 710:136399

    CAS  PubMed  Google Scholar 

  24. Luo H, Liu Z, Xie F, Bilal M, Liu L, Yang R, Wang Z (2021a) Microbial production of gamma-aminobutyric acid: applications, state-of-the-art achievements, and future perspectives. Crit Rev Biotechnol 41:491–512

    PubMed  Google Scholar 

  25. Luo H, Liu Z, Xie F, Bilal M, Peng F (2021b) Lignocellulosic biomass to biobutanol: Toxic effects and response mechanism of the combined stress of lignin-derived phenolic acids and phenolic aldehydes to Clostridium acetobutylicum. Ind Crop Prod 170:113722

    CAS  Google Scholar 

  26. Lv X, Xiong C, Li S, Chen X, Xiao W, Zhang D, Li J, Gong Y, Lin J, Liu Z (2017) Vacuum-assisted alkaline pretreatment as an innovative approach for enhancing fermentable sugar yield and decreasing inhibitor production of sugarcane bagasse. Bioresour Technol 239:402–411

    CAS  PubMed  Google Scholar 

  27. Moodley P, Rorke DCS, Gueguim Kana EB (2019) Development of artificial neural network tools for predicting sugar yields from inorganic salt-based pretreatment of lignocellulosic biomass. Bioresour Technol 273:682–686

    CAS  PubMed  Google Scholar 

  28. Pratto B, Chandgude V, de Sousa R, Cruz AJG, Bankar S (2020) Biobutanol production from sugarcane straw: Defining optimal biomass loading for improved ABE fermentation. Ind Crop Prod 148:112265

    CAS  Google Scholar 

  29. Puig-Arnavat M, Hernández JA, Bruno JC, Coronas A (2013) Artificial neural network models for biomass gasification in fluidized bed gasifiers. Biomass Bioenergy 49:279–289

    CAS  Google Scholar 

  30. Rajan K, Elder T, Abdoulmoumine N, Carrier DJ, Labbé N (2020) Understanding the in situ state of lignocellulosic biomass during ionic liquids-based engineering of renewable materials and chemicals. Green Chem 22:6748–6766

    CAS  Google Scholar 

  31. Rashid T, Taqvi SAA, Sher F, Rubab S, Thanabalan M, Bilal M, ul Islam B, (2021) Enhanced lignin extraction and optimisation from oil palm biomass using neural network modelling. Fuel 293:120485

    CAS  Google Scholar 

  32. Schutyser W, Renders T, Van den Bosch S, Koelewijn SF, Beckham GT, Sels BF (2018) Chemicals from lignin: an interplay of lignocellulose fractionation, depolymerisation, and upgrading. Chem Soc Rev 47:852–908

    CAS  PubMed  Google Scholar 

  33. Sewsynker-Sukai Y, Gueguim Kana EB (2018) Microwave-assisted alkalic salt pretreatment of corn cob wastes: process optimization for improved sugar recovery. Ind Crop Prod 125:284–292

    CAS  Google Scholar 

  34. Siqueira G, Arantes V, Saddler JN, Ferraz A, Milagres AMF (2017) Limitation of cellulose accessibility and unproductive binding of cellulases by pretreated sugarcane bagasse lignin. Biotechnol Biofuels 10:176

    PubMed  PubMed Central  Google Scholar 

  35. Sivagurunathan P, Kumar G, Mudhoo A, Rene ER, Saratale GD, Kobayashi T, Xu K, Kim S-H, Kim D-H (2017) Fermentative hydrogen production using lignocellulose biomass: an overview of pre-treatment methods, inhibitor effects and detoxification experiences. Renew Sust Energ Rev 77:28–42

    CAS  Google Scholar 

  36. Solarte-Toro JC, Romero-García JM, Martínez-Patiño JC, Ruiz-Ramos E, Castro-Galiano E, Cardona-Alzate CA (2019) Acid pretreatment of lignocellulosic biomass for energy vectors production: a review focused on operational conditions and techno-economic assessment for bioethanol production. Renew Sust Energ Rev 107:587–601

    CAS  Google Scholar 

  37. Sunphorka S, Chalermsinsuwan B, Piumsomboon P (2017) Artificial neural network model for the prediction of kinetic parameters of biomass pyrolysis from its constituents. Fuel 193:142–158

    CAS  Google Scholar 

  38. Tang Q, Chen Y, Yang H, Liu M, Xiao H, Wang S, Chen H, Raza Naqvi S (2021) Machine learning prediction of pyrolytic gas yield and compositions with feature reduction methods: effects of pyrolysis conditions and biomass characteristics. Bioresour Technol 339:125581

    CAS  PubMed  Google Scholar 

  39. Unrean P (2016) Bioprocess modelling for the design and optimization of lignocellulosic biomass fermentation. Bioresour Bioprocess 3:1

    Google Scholar 

  40. Vani S, Sukumaran RK, Savithri S (2015) Prediction of sugar yields during hydrolysis of lignocellulosic biomass using artificial neural network modeling. Bioresour Technol 188:128–135

    CAS  PubMed  Google Scholar 

  41. Xia Q, Chen C, Yao Y, Li J, He S, Zhou Y, Li T, Pan X, Yao Y, Hu L (2021) A strong, biodegradable and recyclable lignocellulosic bioplastic. Nat Sustain 4:627–635

    Google Scholar 

  42. Xu GC, Ding JC, Han RZ, Dong JJ, Ni Y (2016) Enhancing cellulose accessibility of corn stover by deep eutectic solvent pretreatment for butanol fermentation. Bioresour Technol 203:364–369

    CAS  PubMed  Google Scholar 

  43. Xu L, Zhu L, Dai Y, Gao S, Wang Q, Wang X, Chen X (2021) Impact of yeast fermentation on nutritional and biological properties of defatted adlay (Coix lachryma-jobi L.). LWT Food Sci Technol 137:110396

    CAS  Google Scholar 

  44. Yang J, Huang Y, Xu HY, Gu DY, Xu F, Tang JT, Fang C, Yang Y (2020a) Optimization of fungi co-fermentation for improving anthraquinone contents and antioxidant activity using artificial neural networks. Food Chem 313:126138

    PubMed  Google Scholar 

  45. Yang X, Han D, Zhao Y, Li R, Wu Y (2020b) Environmental evaluation of a distributed-centralized biomass pyrolysis system: a case study in Shandong. China. Sci Total Environ 716:136915

    CAS  PubMed  Google Scholar 

  46. Yao L, Yang H, Yoo CG, Chen C, Meng X, Dai J, Yang C, Yu J, Ragauskas AJ, Chen X (2021) A mechanistic study of cellulase adsorption onto lignin. Green Chem 23:333–339

    CAS  Google Scholar 

  47. Yuan Y, Jiang B, Chen H, Wu W, Wu S, Jin Y, Xiao H (2021) Recent advances in understanding the effects of lignin structural characteristics on enzymatic hydrolysis. Biotechnol Biofuels 14:205

    PubMed  PubMed Central  Google Scholar 

  48. Zabed H, Sahu JN, Boyce AN, Faruq G (2016) Fuel ethanol production from lignocellulosic biomass: an overview on feedstocks and technological approaches. Renew Sust Energ Rev 66:751–774

    CAS  Google Scholar 

  49. Zhang H, Han L, Dong H (2021) An insight to pretreatment, enzyme adsorption and enzymatic hydrolysis of lignocellulosic biomass: experimental and modeling studies. Renew Sust Energ Rev 140:110758

    CAS  Google Scholar 

  50. Zhao X, Meng X, Ragauskas AJ, Lai C, Ling Z, Huang C, Yong Q (2021) Unlocking the secret of lignin-enzyme interactions: recent advances in developing state-of-the-art analytical techniques. Biotechnol Adv.

    Article  PubMed  Google Scholar 

Download references


Not applicable.


This work is supported by the National Natural Science Foundation of China (21808075), the Natural Science Foundation of Jiangsu Province (BK20170459), and the Science and Technology Innovation Project of Huaiyin Institute of Technology (HGYK202106).

Author information




HL: supervision, conceptualization, methodology, funding acquisition, writing—reviewing and editing; LG, ZL, and YS: investigation, software, methodology, writing—original draft; FX: data curation, visualization; MB: investigation, writing—review and editing; RY: investigation, visualization; MJT: investigation, writing—review and editing.

Corresponding author

Correspondence to Hongzhen Luo.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

All of the authors have read and approved to submit it to Bioresources and Bioprocessing.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Table S1

. The input data for development of ANN model to predict CGlc and CPhe in biomass hydrolysate after dilute inorganic acid pretreatment and enzymatic hydrolysis.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Luo, H., Gao, L., Liu, Z. et al. Prediction of phenolic compounds and glucose content from dilute inorganic acid pretreatment of lignocellulosic biomass using artificial neural network modeling. Bioresour. Bioprocess. 8, 134 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Lignocellulosic biomass
  • Dilute acid pretreatment
  • Enzymatic hydrolysis
  • Phenolic compounds
  • Artificial neural network
  • Modeling