Mass spectrometry is a valued method to evaluate the metabolomics content of a biological sample. The recent advent of rapid ionization technologies such as Laser Diode Thermal Desorption (LDTD) and Direct Analysis in Real Time (DART) has rendered high-throughput mass spectrometry possible. It is used for large-scale comparative analysis of populations of samples. In practice, many factors resulting from the environment, the protocol, and even the instrument itself, can lead to minor discrepancies between spectra, rendering automated comparative analysis difficult. In this work, a sequence/pipeline of algorithms to correct variations between spectra is proposed. The algorithms correct multiple spectra by identifying peaks that are common to all and, from those, computes a spectrum-specific correction. We show that these algorithms increase comparability within large datasets of spectra, facilitating comparative analysis, such as machine learning.1. P contains exactly one peak from each spectrum in S .2. The average of the m/z values of the peaks in P is equal to v.
Every peak in P has a m/z value located in the interval4. No other peak in S has an m/z value that belongs to [v(1 − w), v(1 + w)]. 5. Every peak in P has an intensity in the interval [t a ,t b ].If and only if all these criteria are satisfied, we say that P is the set of peaks associated with the VLM v.Note that we impose a lower intensity threshold t a , since peaks with a lower intensity will tend to have a lower mass accuracy, and can even be confused with noise. In addition, there is also accuracy issues when the intensity of a peak is higher 2/17 3/17