6the MS-DIAL "bootstrap" version 6 (see Supplementary Methods). Next, unknown MS/MS spectra were elucidated by analyzing authentic standards, mining literature, or predicting the putative structure from fragment ion evidence. Upon formulating mass fragmentation for the representative lipid structure in both ESI(+)-MS/MS and ESI(−)-MS/MS spectra, we expanded the scheme to various acyl chain varieties, referencing the heuristic MS/MS spectra in MSP format to filter noisy spectra via a classical spectral similarity calculation 7 . An example of this process is shown for N-acyl glycylserine, which is unique to MS-DIAL libraries (Fig. 1). After confirming the scalability of lipid subclass-associated characteristic product ions and neutral losses across various acyl chain species, a decision tree algorithm yielded an appropriate lipid structure annotation based on the MS/MS spectrum 8 (see additional details in Fig. 1). Finally, MS/MS spectral libraries and decision trees for 177 ionized forms of 117 lipid subclasses were integrated into MS-DIAL 4. The classifications followed the LipidMAPS 9 definition and the structures are represented by a shorthand notation system 10 ( Supplementary Figs. 2-7, Supplementary Table 2, and Supplementary Note 2): the specifications for the characters virgule "/", underline "_", semicolon ";", rings/double bond equivalents, and atom strings such as "O-", "N-", and "P-" are fully detailed in Supplementary Note 2. Notably, these lipids were characterized in biological samples and formulated based on experimental data rather than in silico, with the coverage outperforming that of existing lipidomics software programs ( Table 1, Supplementary Table 3, and Supplementary Methods): MS-DIAL 4 extended the number of lipid subclasses in the database to yield 3-and 1.5-fold coverage compared to that in the previous versions of MS-DIAL and other software programs, respectively. Moreover, MS-DIAL 4 access to decision tree annotation lacking in the prior versions provides appropriate structure representation of 117 lipid subclasses through fragment evidence for species-, molecular species-, and sn-position level annotations to unequivocally translate lipidomics data into biology for advancing biomarker and drug development and clinical application.
Supplementary Methods and Supplementary
7Overall, we profiled 8,051 unique lipids from 117 lipid subclasses, with 6,570 characterized at the molecular species level including confirmed acyl chain-specific fragments ( Supplementary Table 4). All results including MS-DIAL source codes, mass spectral libraries, and semi-quantitative values defined as LSI level 2 or 3 are managed in our RIKEN PRIMe website (http://prime.psc.riken.jp/) (Supplementary Data 1), and all MS raw data is available at the DropMet section via the indices DM0022, DM0030, and DM0031.MS-DIAL 4 was validated using three LC-MS study subsets (Fig. 2). First, we processed NIST human plasma (SRM 1950) lipidomics data acquired on eight independent platforms with different extraction 8 methods...