Inconsistent peak picking outcomes
are a critical concern in processing
liquid chromatography–mass spectrometry (LC–MS)-based
untargeted metabolomics data. This work systematically studied the
mechanisms behind the discrepancies among five commonly used peak
picking algorithms, including CentWave in XCMS, linear-weighted moving
average in MS-DIAL, automated data analysis pipeline (ADAP) in MZmine
2, Savitzky–Golay in El-MAVEN, and FeatureFinderMetabo in OpenMS.
We first collected 10 public metabolomics datasets representing various
LC–MS analytical conditions. We then incorporated several novel
strategies to (i) acquire the optimal peak picking parameters of each
algorithm for a fair comparison, (ii) automatically recognize false
metabolic features with poor chromatographic peak shapes, and (iii)
evaluate the real metabolic features that are missed by the algorithms.
By applying these strategies, we compared the true, false, and undetected
metabolic features in each data processing outcome. Our results show
that linear-weighted moving average consistently outperforms the other
peak picking algorithms. To facilitate a mechanistic understanding
of the differences, we proposed six peak attributes: ideal slope,
sharpness, peak height, mass deviation, peak width, and scan number.
We also developed an R program to automatically measure these attributes
for detected and undetected true metabolic features. From the results
of the 10 datasets, we concluded that four peak attributes, including
ideal slope, scan number, peak width, and mass deviation, are critical
for the detectability of a peak. For instance, the focus on ideal
slope critically hinders the extraction of true metabolic features
with low ideal slope scores in linear-weighted moving average, Savitzky–Golay,
and ADAP. The relationships between peak picking algorithms and peak
attributes were also visualized in a principal component analysis
biplot. Overall, the clear comparison and explanation of the differences
between peak picking algorithms can lead to the design of better peak
picking strategies in the future.