dThe rapid rise in DNA sequencing has led to an expansion in the number of glycoside hydrolase (GH) families. The GH43 family currently contains ␣-L-arabinofuranosidase, -D-xylosidase, ␣-L-arabinanase, and -D-galactosidase enzymes for the debranching and degradation of hemicellulose and pectin polymers. Many studies have revealed finer details about members of GH43 that necessitate the division of GH43 into subfamilies, as was done previously for the GH5 and GH13 families. The work presented here is a robust subfamily classification that assigns over 91% of all complete GH43 domains into 37 subfamilies that correlate with conserved sequence residues and results of biochemical assays and structural studies. Furthermore, cooccurrence analysis of these subfamilies and other functional modules revealed strong associations between some GH43 subfamilies and CBM6 and CBM13 domains. Cooccurrence analysis also revealed the presence of proteins containing up to three GH43 domains and belonging to different subfamilies, suggesting significant functional differences for each subfamily. Overall, the subfamily analysis suggests that the GH43 enzymes probably display a hitherto underestimated variety of subtle specificity features that are not apparent when the enzymes are assayed with simple synthetic substrates, such as pNP-glycosides.
Carbohydrates serve a range of functional purposes in biological systems, including energy storage, signal transduction, and intracellular trafficking, among others (1). Importantly, carbohydrates are the main end product of plant primary production, representing a large of majority of carbon fixation by plants (2). As a photosynthetically renewable form of fixed carbon, plant biomass represents a prime target for the replacement of petroleumderived fuels for future sustainability efforts. The enzymatic degradation and modification of carbohydrates have thus been cast to the forefront of biofuel production research (3).As functional efforts to discover plant cell wall polysaccharide (PCWP)-degrading enzymes identify novel activities and mechanisms (4, 5), it is important to derive and maintain a concise classification system for these enzymes. A sequence-based classification of carbohydrate-active enzymes (CAZymes) began in 1991 (6), with the classification of 35 families of glycoside hydrolases (GHs). Today the CAZy database (7) comprises 5 separate enzyme classes, namely, the aforementioned GHs, glycosyltransferases (GTs), polysaccharide lyases (PLs), carbohydrate esterases (CEs), and auxiliary activities (AAs), as well as associated carbohydrate binding modules (CBMs), that together correspond to over 530,000 individual sequences (at the time of submission of this article). The largest of these classes are the glycoside hydrolases, currently represented by over 241,000 sequences classified into 133 families based on amino acid sequence similarity.The rapid advancements in DNA sequencing over the past decade have exponentially increased the number of sequences assigned to each family. Henc...