False positive and false negative peaks detected from extracted ion chromatograms (EIC) are an urgent problem with existing software packages that preprocess untargeted liquid or gas chromatography-mass spectrometry metabolomics data because they can translate downstream into spurious or missing compound identifications. We have developed new algorithms that carry out the sequential construction of EICs and detection of EIC peaks. We compare the new algorithms to two popular software packages XCMS and MZmine 2 and present evidence that these new algorithms detect significantly fewer false positives. Regarding the detection of compounds known to be present in the data, the new algorithms perform at least as well as XCMS and MZmine 2. Furthermore, we present evidence that mass tolerance in m/z should be favored rather than mass tolerance in ppm in the process of constructing EICs. The mass tolerance parameter plays a critical role in the EIC construction process and can have immense impact on the detection of EIC peaks.
XCMS and MZmine 2 are two widely used software packages for preprocessing untargeted LC/MS metabolomics data. Both construct extracted ion chromatograms (EICs) and detect peaks from the EICs, the first two steps in the data preprocessing workflow. While both packages have performed admirably in peak picking, they also detect a problematic number of false positive EIC peaks and can also fail to detect real EIC peaks. The former and latter translate downstream into spurious and missing compounds and present significant limitations with most existing software packages that preprocess untargeted mass spectrometry metabolomics data. We seek to understand the specific reasons why XCMS and MZmine 2 find the false positive EIC peaks that they do and in what ways they fail to detect real compounds. We investigate differences of EIC construction methods in XCMS and MZmine 2 and find several problems in the XCMS centWave peak detection algorithm which we show are partly responsible for the false positive and false negative compound identifications. In addition, we find a problem with MZmine 2's use of centWave. We hope that a detailed understanding of the XCMS and MZmine 2 algorithms will allow users to work with them more effectively and will also help with future algorithmic development.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.