23Glycans are fundamental cellular building blocks, involved in many organismal functions. 24Advances in glycomics are elucidating the roles of glycans, but it remains challenging to 25 properly analyze large glycomics datasets, since the data are sparse (each sample often has only a 26 few measured glycans) and detected glycans are non-independent (sharing many intermediate 27 biosynthetic steps). We address these challenges with GlyCompare, a glycomic data analysis 28 approach that leverages shared biosynthetic pathway intermediates to correct for sparsity and 29 non-independence in glycomics. Specifically, quantities of measured glycans are propagated to 30 intermediate glycan substructures, which enables direct comparison of different glycoprofiles 31 and increases statistical power. Using GlyCompare, we studied diverse N-glycan profiles from 32 glycoengineered erythropoietin. We obtained biologically meaningful clustering of mutant cell 33 glycoprofiles and identified knockout-specific effects of fucosyltransferase mutants on tetra-34 antennary structures. We further analyzed human milk oligosaccharide profiles and identified 35 novel impacts that the mother's secretor-status on fucosylation and sialylation. Our substructure-36 oriented approach will enable researchers to take full advantage of the growing power and size of 37 glycomics data. 38 39 40 the rapid generation of many glycoprofiles with detailed glycan composition 7-10 , exposing the 54 complex and heterogeneous glycosylation patterns on lipids and proteins 11,12 . Large glycoprofile 55 datasets and supporting databases are also emerging, including GlyTouCan 13 , UnicarbDB 14 , 56GlyGen and UniCarbKB 15 . 57These new technologies and databases provide opportunities to examine global trends in 58 glycan function and their association with disease. However, the rapid and accurate comparison 59 of glycoprofiles can be challenging with the size, sparsity and heterogeneity of such datasets. 60Indeed, in any one glycoprofile, only a few glycans may be detected among the thousands of 61 possible glycans 16 . Thus, if there is a major perturbation to glycosylation in a dataset, few 62 glycans, if any, may overlap between samples. However, these non-overlapping glycans may 63 4 only differ in their synthesis by as few as one enzymatic step. Thus, it can be difficult to know 64 which glycans to compare. Furthermore, since glycans often share substantial portions of their 65 biosynthetic pathways with each other, statistical methods that assume independence (e.g., t-66 tests, ANOVA, etc) are inappropriate for glycomics. Here we address these challenges by 67 proposing glycan substructures, or intermediates, as the appropriate functional units for 68 meaningful glycoprofile comparisons, since each substructure can capture one step in the 69 complex process of glycan synthesis. Thus, using substructures for comparison, we account for 70 the shared dependencies across glycans. 71Previous work has investigated the similarity across glycans using glycan motifs, ...