Current understanding of substrate specificity and regioselectivity of LPMOs

Renewable biomass such as cellulose and chitin are the most abundant sustainable sources of energy and materials. However, due to the low degradation efficiency of these recalcitrant substrates by conventional hydrolases, these biomass resources cannot be utilized efficiently. In 2010, the discovery of lytic polysaccharide monooxygenases (LPMOs) led to a major breakthrough. Currently, LPMOs are distributed in 7 families in CAZy database, including AA9–11 and AA13–16, with different species origins, substrate specificity and oxidative regioselectivity. Effective application of LPMOs in the biotransformation of biomass resources needs the elucidation of the molecular basis of their function. Since the discovery of LPMOs, great advances have been made in the study of their substrate specificity and regioselectivity, as well as their structural basis, which will be reviewed below.


Introduction
Biocatalytic degradation of renewable biomass resources is a potential way to address energy and environmental crises. Despite the abundance, the crystalline structure of cellulose and chitin hinders the accessibility of hydrolases, and thus the effective saccharification by traditional glycoside hydrolase systems. In 1950, Reese et al. postulated that the process of cellulolytic organisms degrading cellulose involves two steps (Reese et al. 1950). Firstly, the 'C1' degrades native cellulose into shorter linear polyanhydroglucose chains, which are then hydrolyzed by Cx into soluble, small molecules. In 1974, Eriksson et al. reported the presence of an oxidase in the extracellular enzyme system of Sporotrichum pulverulentum, which boosted the degradation of cellulose by the mixture of endo-and exo-glucanases (Eriksson et al. 1974). However, this oxidase has not been clearly characterized for a long time.
The first structure of Cel61B (a member of GH61 family) was resolved in 2008, revealing its difference from other glycoside hydrolases, suggesting that it may have different enzyme activities (Karkehabadi et al. 2008). Until 2010, Vaaje-Kolstad et al. reported that the bacterial CBP21protein (a member of CBM33 family) is actually an enzyme that catalyzes oxidative depolymerization of chitin (Vaaje-Kolstad 2010). Shortly thereafter, the cellulose oxidative activities of GH61 family members were characterized (Quinlan et al. 2011). Then these Cudependent enzymes were named as lytic polysaccharide monooxygenases (LPMOs), and the GH61 and CBM33 families were reclassified as AA9 (Auxiliary Activity family 9) and AA10, respectively. Currently the LPMOs are distributed in 7 Auxiliary Activity families in CAZy database (www.cazy.org), with various origins and substrate specificities: AA9s, AA11s, AA13s, AA14s and AA16s are mainly from eukaryota with cellulose-, chitin-, starch-, and xylan-active, respectively; AA10s are from bacteria, eukaryota, viruses or archaea, with cellulose-or chitinactivity; AA15s are from eukaryota (including insect) or viruses, with cellulose-or chitin-activity. The currently reported cleavage of chitin, starch and xylan substrates is C1-oxidized, while the cleavage of cellulosic substrates Open Access *Correspondence: zhuhh@gdim.cn State Key Laboratory of Applied Microbiology Southern China, Guangdong Provincial Key Laboratory of Microbial Culture Collection and Application, Guangdong Microbial Culture Collection Center (GDMCC), Guangdong Institute of Microbiology, Guangdong Academy of Sciences, Guangzhou 510070, China is C1-or C4-oxidized, or both. The information on currently characterized LPMOs are summarized in Table 1.
Despite the low sequence identities, the catalytic domains of these LPMOs share some common structural features (Fig. 1), as recently reviewed (Beeson et al. 2015;Hemsworth et al. 2013a;Span and Marletta 2015;Vaaje-Kolstad et al. 2017). The core of the catalytic domain is a β sandwich of seven to nine β-strands. Loops connecting these β-strands constitute the 'flat' substrate binding surface, which is believed to interact with flat surfaces of crystalline substrates. The region located between β1 and β2 of LPMO9 (between β1 and β3 of LPMO10), denoted L2, includes a variable number of loops and short helices. Some LPMOs have an insertion between β3 and β4 denoted L3, which interacts with L2. In AA9 and AA13 LPMOs, there are LS (loop short) on the opposite side of L2. Besides, AA9 members have a long C-terminal loop, termed LC. As discussed below, the variable length and amino acid constitution of these loops might contribute to the substrate specificity and regioselectivity. The N-terminal histidine and a second conserved histidine coordinate a copper ion, forming the 'histidine brace' . The N-terminal histidine of some fungal LPMOs is methylated at the Nε2, and the significance of this methylation is unclear.
Studies have shown that adding LPMOs to cellulase cocktails can improve the degradation efficiency of cellulose biomass and reduce the required enzyme amount (de Gouvea et al. 2019;Dimarogona et al. 2013;Harris et al. 2010;Hemsworth et al. 2015;Zhang et al. 2019). It is speculated that this synergy is due to the oxidative cleavage of polysaccharide crystalline regions by LPMOs, which provides more accessible sites for glycoside hydrolases (Fig. 2). Further elucidating the biological functions and catalytic mechanisms of these enzymes will bring more exciting possibilities for their application in the utilization of renewable biomass resources. The catalytic mechanism of LPMOs has been in scientific debate. One view is that, the catalytic center Cu (II) is activated by reduction into Cu (I) by two external electrons (Kjaergaard et al. 2014;Kracher et al. 2016). The Cu (I) activates dioxygen, leading to hydrogen abstraction from one of the carbons in the scissile glycoside bond. Then the hydroxylation of the resulting substrate radical leads to bond cleavage via an elimination reaction. In other studies, however, it has been proposed that, instead of dioxygen, H 2 O 2 is the preferred co-substrate for LPMOs, in a peroxygenase reaction where a single priming reduction to Cu(I) is needed . The catalytic mechanism of LPMOs has been extensively reviewed Tandrup et al. 2018;Walton and Davies 2016) and not discussed in depth here. The focus of this review is to give an insight into the current understanding of the substrate specificity, oxidation regioselectivity and their structural basis of LPMOs.

Substrate specificity
AA9 (former GH61) and AA10 (former CBM33) were originally found to act on crystalline cellulose and chitin substrates, respectively. As more related proteins are characterized, the broad substrate spectrum of LPMO superfamily is revealed. Besides insoluble substrates (such as cellulose, chitin, starch and xylan), the soluble oligosaccharides like xyloglucan, glucomannan and β-(1→3), (1→4)-d-glucan have been found to be oxidized by some LPMOs (Isaksen et al. 2014;Kojima et al. 2016). Biochemical characterization and structural studies, especially the complex structures of LPMOs and soluble oligosaccharide substrates, provide us much for in-depth understanding of LPMOs (Frandsen et al. 2016;Simmons et al. 2017). Detailed sequence and structure comparisons have revealed that the substrate binding surfaces of LPMOs with different substrate specificities have diverse characteristics in terms of amino acid composition and topological features. Since the L2, L3, LS and LC loops constitute the majority of the substrate binding surface, and their amino acids composition are highly variable, these loops are believed to affect substrate recognition and specificity.

Amino acids composition on the substrate binding surface
There are usually several aromatic amino acids on the substrate binding surface loops of LPMO9s (Fig. 3a, b). From structural studies and MD simulations, it was found that the spatial distribution of these aromatic amino acids facilitates stacking interactions with the sugar units of cellulose substrates, although the enzymes may bind to the surface of the cellulose fibers in different directions Wu et al. 2013). In Wu's study, 100 ns MD simulations of PchGH61D on cellulose showed that the three tyrosines on substrate binding surface tightly bonded with polysaccharide chains in the substrate (the interaction energies were − 10.86 kcal/ mol for Y28, − 10.17 kcal/mol for Y75 and − 9.5 kcal/ mol for Y198, respectively) and are the main contributors to substrate binding. While LPMO10s generally only have one aromatic amino acid involved in substrate binding, LPMO11s and LPMO13s do not even have aromatic amino acids on substrate binding surface (Fig. 3a), and their polar amino acids are more abundant, possibly binding to substrates by polar interactions (Forsberg et al. 2014a;Hemsworth et al. 2014). Structural studies and site-directed mutagenesis revealed that binding of CBP21 to chitin is mediated primarily by conserved, solventexposed, hydrophilic residues, which arranged in a patch on the substrate binding surface (Aachmann et al. 2012;

Vaaje-Kolstad et al. 2005b). MD simulations of CBP21
on crystalline chitin substrates have also shown that although the only tyrosine Y54 on the substrate binding surface is a key factor, the hydrogen bonding formed between substrate and the residues E55, T111, H114, Q57, and D182 was very important for substrate binding .
Within the AA10 family, the amino acid composition of the substrate-binding surface of different substrate-specific LPMOs is also diverse. The Gln-Thr pair (Q78 and T133 in CjLPMO10A) is presumed to be a determinant of chitin activity, since it is conserved in chitin-active LPMO10s, whereas in cellulose-active LPMO10s, the corresponding sites are Phe and Trp ). Li et al. suggested that, compared with chitin-active SmAA10A, an insertion in the cellulose-active ScAA10C that contains four aromatic residues could account for cellulose specificity (Li et al. 2012). In previous work, we found a motif on L2 with different amino acid composition in different substratespecific LPMO10s (Fig. 3c) (Zhou et al. 2019b). In cellulose-active LPMO10s, this motif mainly consists of non-polar amino acids (Y

[W]NWF[N]G[A]V[N]L[Y]).
While in chitin-active LPMO10s, this motif mainly consists of polar amino acids (Y[W]EPQSVE). We speculated that the different amino acid composition of this motif may lead to differences in substrate binding surface electrostatic potential, which in turn affects Exo-glycosidases (Exo-GH) act on chain ends to generate soluble sugars. LPMOs and endo-glycosidases (Endo-GH) act on crystalline and amorphous regions within the polysaccharide chain, respectively, providing more accessible sites for exo-glycosidases. The enlarged dotted box is a schematic diagram of the action mechanism of the LPMOs. Oxidation at C1 results in the formation of a lactone at the reducing end. C4 oxidation produces a ketoaldose at the non-reducing end substrate specificity. Jensen et al. constructed a mutation library of five sites on the substrate binding surface of ScLPMO10C, three of which are located in this motif region (Y79, N80, F82), and the other two are located in the adjacent loops (Y111, W141). Substrate specificity of the mutant M18 (Y79/N80D/F82A/Y111F/W141Q) significantly changed from wild-type cellulose-preference to chitin-preference, demonstrating the role of these residues in substrate specificity .
The complex structures of the LsAA9A and soluble oligosaccharide substrates showed that in addition to the Y203 stacking, the hydrogen bond network formed between the +2 subsite and the polar residues (N28, H66 and N67) plays an important role in substrate binding, and this may be a determinant of soluble oligosaccharide activity, as sequence and structure alignments found that there is no corresponding residue forming a hydrogen bond network in LPMOs that can only act on crystalline substrates (Frandsen et al. 2016).

The topological features of substrate binding surface
The crystal structure of BaAA10A shows a cavity near the catalytic Cu center, and the authors speculated that it is for dioxygen binding (Fig. 3a) (Hemsworth et al. 2013b). Shortly thereafter, through structural comparisons, Forsberg et al. found that this cavity is absent in the celluloseactive LPMO10s (Forsberg et al. 2014a). Therefore, the cavity was presumed to accommodate N-acetyl group of chitin substrates, and may be a structural feature that determines substrate specificity. However, one exception is the chitin-active CjLPMO10A, which shows similar features to cellulose-active LPMO10s without this cavity .
LPMOs that can act on oligosaccharides, such as LsAA9A, NcLPMO9C and NcLPMO9D, have a more contoured substrate binding surface than LPMOs that can only act on crystalline substrates (Borisova et al. 2015;Frandsen et al. 2016;Li et al. 2012). The ridge near substrate binding subsites +1 and +2 was proposed to allow LPMOs binding to more contoured substrates such as oligosaccharides (Fig. 3a).
In AoAA13, the surface loops (the long loop preceding β2, the loop between β2 and β3, the long loop preceding β4 and the loop between β5 and β6) form a shallow groove, crossing the copper active site (Fig. 3a) (Lo Leggio et al. 2015). It was speculated that, compared with the flatter substrate binding surface of LPMO9s, which is more suitable for the binding of flatter crystalline cellulose substrates, the groove on the surface of AoAA13 might be more suitable for the binding of the contoured surface of resistant starch. It is worth noting that no crystal structures of the currently characterized LPMO13s have been resolved so far, and the structurally characterized AoAA13 has not been reported to have starch activity.
Similarly, the substrate binding surface of PcAA14B, an xylan-active LPMO, has a rippled shape with a clamp formed by two prominent surface loops, which are equivalent to the L2 and L3 regions of AA9 (Figs. 1 and 3a). The extended L3 loop of PcAA14B forms a protrusion through the cystines (C67-C90). Although there is no enzyme-substrate complex structure, these loops constitute a large part of the substrate binding surface, and it is speculated that this clamp is a structural feature of LPMO14s required for the xylan substrate binding (Couturier et al. 2018).
From the sequence alignment of PaLPMO9H and NcLPMO9C, it was speculated that the L3 loop, which is a common feature of these two enzymes, might be a prerequisite for xyloglucan specificity (Bennati-Granier et al. 2015). NMR (nuclear magnetic resonance) studies on enzyme-substrate interactions also showed that L3 of NcLPMO9C did participate in the binding of xyloglucan substrate (Courtade et al. 2016). However, as more LPMOs are characterized, some enzymes have been found to have xyloglucan-activity, but L3 is absent, such as GtLPMO9A-2. It was presumed that the extended L2 of the xyloglucan-active GtLPMO9A-2 compensate for the lack of L3 (Kojima et al. 2016).

The appended modules
Similar to GHs (glycoside hydrolases), a considerable part of LPMOs are modular, with domains of noncatalytic CBMs (carbohydrate-binding modules), GHs or other unknown functions appended to the catalytic domain. Domain similarity network analysis has shown the correlation between the additional domains and the substrate specificity of the full enzymes (Book et al. 2014;Zhou et al. 2019b). CBM truncation studies have been reported for both LPMO9s and LPMO10s (Chalak et al. 2019;Courtade et al. 2018;Crouch et al. 2016;Forsberg et al. 2016;Laurent et al. 2019). Comparison of the performance of LPMOs with and without CBMs have shown that, deletion of CBMs reduced LPMO's binding capacity to crystalline substrates, especially at low substrate concentrations. Therefore, CBMs may affect substrate specificity through promoting the binding of LPMOs to the appropriate substrates.

Oxidative regioselectivity
LPMO9s have been shown to oxidize either the C1, C4 or both the C1 and C4 carbon of the scissile bond of cellulose substrates. According to the oxidative regioselectivity, LPMO9s have been classified into three types: PMO1s are the strict C1-oxidizers; PMO2s are the strict C4-oxidizers; PMO3s are the mixed C1/ C4-oxidizers; and a subtype of PMO3, PMO3*s, are the C1-oxidizers (Vu et al. 2014a). Cellulose-active LPMO10s are strict C1-oxidizers or mixed C1/C4-oxidizers, whereas no strict C4-oxidizing LPMO10 has been reported. LPMOs acting on chitin (LPMO10s, 11s and 15s), starch (LPMO13s) and xylan (LPMO14s) have only been shown to oxidize the C1-carbon. It is speculated that the oxidative regioselectivity may be determined by the precise positioning of the enzyme on the substrates, so factors that affect the relative position of the enzyme's active center Cu and the C1 or C4 carbon of the scissile glycosidic bond may affect regioselectivity (Fig. 4).

Amino acid composition and arrangement on substrate binding surface
Due to the contribution of L2 to the substrate binding surface and the diversity of its amino acid composition, many studies on the regioselectivity of LPMOs have focused on this region. By sequence alignment, Vu et al. found that PMO3s had a 12-amino acid insertion on L2, including a conserved tyrosine, compared to other subgroups of LPMO9s. Deletion of this sequence caused the loss of C4-oxidizing function of NCU07760, indicating the importance of this sequence for C4 regioselectivity of PMO3. However, although the conserved tyrosine in this insertion is a feature of PMO3, mutation of this residue into glycine did not change the regioselectivity of NCU07760 (Vu et al. 2014a).
Sequence and structural information show that the number and distribution of aromatic residues on the surfaces of LPMOs are different. Therefore, it is speculated that LPMOs may bind to the substrates in different directions, resulting in different regioselectivity (Li et al. 2012). Recently, Danneels et al. studied the oxidative regioselectivity of LPMO9s in detail (Danneels et al. 2019). One part of the research was the mutation of aromatic amino acids on the substrate binding surfaces of PcLPMO9D, ScLPMO9C and HjLPMO9A. They found that the properties of these aromatic amino acids affect C1/C4-oxidation ratios. In another work, Liu et al. used molecular dynamics simulations to study the binding mode of HiLPMO9B to the substrate, and found that multiple surface-exposed hydrophobic residues, including the tyrosine on L2, are important for substrate binding in this C1-specific LPMOs. Besides, acidic amino acids on L2 and LC participate in substrate binding. In both the two binding modes obtained with different binding directions, the catalytic center Cu is more biased towards the C1 carbon of the glycosidic bond, suggesting that the arrangement of amino acids on substrate binding surface may affect regioselectivity by affecting the relative position of the catalytic center Cu and the substrate .
Similar speculation has been made for LPMO10s. On the substrate-binding surface of chitin-active C1-specific LPMO10s, the conservative amino acids involved in the formation of hydrogen bonds with the polysaccharide substrate are arranged on opposite sides of the catalytic center Cu, and thus direct the orientation of the substrate relative to the Cu. This directed binding makes the enzyme prone to act on C1 carbon of the scissile glycosidic bond (Hemsworth et al. 2013b). Forsberg et al. mutated a subset of coevolutionary residues of C1/C4-oxidizing MaLPMO10B into the corresponding residues of C1-oxidizing LPMO10s, and the resulting mutants lost the C4-oxidizing activity. They found that, the residues located near the catalytic Cu that are involved in substrate positioning (especially the N85 of MaLPMO10B) are the major determinants of regioselectivity ).
Accessibility to the surface-exposed axial copper coordination site A conserved alanine in LPMO10s active site has been postulated to provide steric congestion at the solventfacing axial position of active center Cu (Hemsworth et al. 2013b). Subsequent research showed that the loop hosting this alanine adopts different conformations in C1-and C1/C4-oxidizers, making the solvent-facing axial position of C1/C4-specific ScLPMO10B more open than C1-specific ScLPMO10C (Forsberg et al. 2014a). Similarly, structural comparisons revealed that, strictly C1-oxidizing LPMO9s have a conserved tyrosine, preventing optimal axial access to the copper ion, whereas C4-oxidizing LPMO9s have an open access to this position. The mixed C1/C4-oxidizing LPMO9s show an intermediate situation (Borisova et al. 2015). Thus, the accessibility of surface-exposed axial position of Cu, or the ability to bind a ligand in the axial position, could be a determinant of C4-oxidizing activity. However, recent studies suggested that, mutations affecting accessibility of this axial position did not change the regioselectivities of PcLPMO9D and MaLPMO10B (Danneels et al. 2019;Forsberg et al. 2018).

The appended CBM modules
The CBM domains seem to affect the binding of LPMOs to substrates, thereby affecting the precise positioning of the enzymes on the substrates' surfaces, that is, the relative position of C1 or C4 carbon to the catalytic center Cu, and thus the regioselectivity of the enzymes. Removing or replacing the endogenous CBMs of LPMO9s and LPMO10s have been reported to alter the regioselectivity Fig. 4 Factors influencing oxidative regioselectivity of these enzymes. For instance, deleting CBM1 of PaLP-MO9H significantly increased the proportion of C1-oxidized products (Laurent et al. 2019). Crouch et al. replaced the endogenous CBM2a domain of TbLPMO10 with the CBM10 of CjLPMO10B, and found that the ratio of non-oxidized to oxidized products of the mutant increased significantly. The authors speculated that the non-oxidized products are the oligosaccharides derived from C1-oxidation near the reducing end of cellulose, which may be due to the grafted CBM affecting the localization of the enzyme on the substrate . But the impact of CBMs on the regioselectivity of LPMOs is also controversial, e.g., removing the CBM domains did not significantly change the regioselectivity of MaLPMO10B, NcLPMO9C and HjLPMO9A (Danneels et al. 2019;Forsberg et al. 2018;Laurent et al. 2019).

N-Glycan on substrate binding surface
Fungal-derived LPMOs are generally glycosylated on the surface, but their function is unclear. Sequence and structural information show that C1/C4-specific LPMO9s often have an N-glycan at the planar active surface, which is a feature different from the other two groups (Li et al. 2012). Mutation studies showed that removing this N-glycan can alter the C1/C4-oxidation ratios of HjLP-MO9A. The authors suggested that this is because N-glycan affects the structural features of the substrate binding surface, which in turn affects the substrate binding and oxidative force accurate directions (Danneels et al. 2019).

Structures of substrates
The regioselectivity of LPMOs appears to be substratedependent. The most typical examples are the LPMO10s with both cellulose-and chitin-activity. They are C1/ C4-specific for cellulose oxidation and C1-specific for chitin oxidation. Recently, a multifunctional LPMO10, KpLPMO10A has been reported that besides chitin-and cellulose-activity, it can also act on xylan to produce C4-oxidized products (Correa et al. 2019). In addition, it is reported that, PaLPMO9H is C4-specific on mixedlinkage glucans, and C1/C4-specific on glucomannan (Fanuel et al. 2017). LsAA9A and CvAA9A are reported to be C4-specific for shorter oligosaccharides and C1/ C4-specific for longer polysaccharides (Simmons et al. 2017).

Conclusions
Elucidating the molecular basis of substrate specificity and oxidative regioselectivity of LPMOs will be more helpful for their application in the biotransformation of renewable biomass. Researches indicate that the substrate binding and regioselectivity of LPMOs are precisely regulated. This precise regulation is based on the complex synergistic modules and amino acid networks that evolved from interactions with complex and diverse substrate structures in nature. However, the characterized LPMOs are only a small part of the sequences that have been found so far. More enzymatic and structural characterization is needed to provide more information. Structural-based mutation studies and MD simulations will bring in-depth understanding of the molecular basis of the function of LPMOs. In addition, given the complexity and structural characteristics of the substrates, it is necessary to develop more effective enzyme activity detection methods to avoid the neglect of weak enzyme activity.