Protein O-glycosylation has long been
recognized to be closely associated with many diseases, particularly
with tumor proliferation, invasion, and metastasis. The ability to
efficiently profile the variation of O-glycosylation
in large-scale clinical samples provides an important approach for
the development of biomarkers for cancer diagnosis and for therapeutic
response evaluation. Therefore, mass spectrometry (MS)-based techniques
for high throughput, in-depth and reliable elucidation of protein O-glycosylation in large clinical cohorts are in high demand.
However, the wide existence of serine and threonine residues in the
proteome and the tens of mammalian O-glycan types
lead to extremely large searching space composed of millions of theoretical
combinations of peptides and O-glycans for intact O-glycopeptide database searching. As a result, an exceptionally
long time is required for database searching, which is a major obstacle
in O-glycoproteome studies of large clinical cohorts.
More importantly, because of the low abundance and poor ionization
of intact O-glycopeptides and the stochastic nature
of data-dependent MS2 acquisition, substantially elevated missing
data levels are inevitable as the sample number increases, which undermines
the quantitative comparison across samples. Therefore, we report a
new MS data processing strategy that integrates glycoform-specific
database searching, reference library-based MS1 feature matching and
MS2 identification propagation for fast identification, in-depth,
and reproducible label-free quantification of O-glycosylation
of human urinary proteins. This strategy increases the database searching
speeds by up to 20-fold and leads to a 30%–40% enhanced intact O-glycopeptide quantification in individual samples with
an obviously improved reproducibility. In total, we identified 1300
intact O-glycopeptides in 36 healthy human urine
samples with a 30%–40% reduction in the amount of missing data.
This is currently the largest dataset of urinary O-glycoproteome and demonstrates the application potential of this
new strategy in large-scale clinical investigations.