This
study significantly expands both the scope and method of identification
for construction of a previously reported tandem mass spectral library
of 74 human milk oligosaccharides (HMOs) derived from results of combined
LC-MS/MS experiments and comprehensive structural analysis of HMOs.
In the present work, a hybrid search “bootstrap” identification
method was employed that substantially broadens the coverage of milk
oligosaccharides and thereby increases utility use of a spectrum library-based
method for the rapid tentative identification of all distinguishable
glycans in milk. This involved hybrid searching of the previous library,
which was itself constructed using the hybrid search of oligosaccharide
spectra in the NIST 17 Tandem MS Library. The general approach appears
applicable to library construction of other classes of compounds.
The coverage of oligosaccharides was significantly extended using
milks from a variety of mammals, including bovine, Asian buffalo,
African lion, and goat. This new method led to the identification
of another 145 oligosaccharides, including an additional 80 HMOs from
reanalysis of human milk. The newly identified compounds were added
to a freely available mass spectral reference database of 219 milk
oligosaccharides. We also provide suggestions to overcome several
limitations and pitfalls in the interpretation of spectra of unknown
oligosaccharides.
We describe a method to estimate statistical significance of frameshift alignments, similar to classic BLAST statistics. (BLAST presently does not permit its alignments to include frameshifts.) We also illustrate the continuing usefulness of frameshift alignment with two 'post-genomic' applications: (i) when finding pseudogenes within the human genome, frameshift alignments show that most anciently conserved non-coding human elements are recent pseudogenes with conserved ancestral genes; and (ii) when analyzing metagenomic DNA reads from polluted soil, frameshift alignments show that most alignable metagenomic reads contain frameshifts, suggesting that metagenomic analysis needs to use frameshift alignment to derive accurate results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.