Compound
annotation using spectral-matching algorithms
is vital
for (MS/MS)-based metabolomics research, but is hindered by the lack
of high-quality reference MS/MS library spectra. Finding and removing
errors from libraries, including noise ions, is mostly done manually.
This process is both error-prone and time-consuming. To address these
challenges, we have developed an automated library curation pipeline,
LibGen, to universally build novel spectral libraries. This pipeline
corrects mass errors, denoises spectra by subformula assignments,
and performs quality control of the reference spectra by calculating
explained intensity and spectral entropy. We employed LibGen to generate
three high-quality libraries with chemical standards of 2241 natural
products. To this end, we used an IQ-X orbital ion trap mass spectrometer
to generate 1947 classic high-energy collision dissociation spectra
(HCD) as well as 1093 ultraviolet-photodissociation (UVPD) mass spectra.
The third library was generated by an electron-activated collision
dissociation (EAD) 7600 ZenoTOF mass spectrometer yielding 3244 MS/MS
spectra. The natural compounds covered 140 chemical classes from prenol
lipids to benzypyrans with >97% of the compounds showing <0.2
Tanimoto-similarity,
demonstrating a very high structural variance. Mass spectra showed
much higher information content for both UVPD- and EAD-mass spectra
compared to classic HCD spectra when using spectral entropy calculations.
We validated the denoising algorithm by acquiring MS/MS spectra at
high concentration and at 13-fold diluted chemical standards. At low
concentrations, a higher proportion of spectra showed apparent fragment
ions that could not be explained by subformula losses of the parent
molecule. When more than 10% of the total intensity of MS/MS fragments
was regarded as noise ions, spectra were considered as low quality
and were not included in the libraries. As the overall process is
fully automated, LibGen can be utilized by all researchers who create
or curate mass spectral libraries. The libraries we created here are
publicly available at MassBank.us.