High false positive rates in GC-MS metabolomics peak detection is a common issue that impedes automated analysis of large-scale datasets. There is a growing need for improving the reliability and scalability of data analysis workflows. Many algorithms are available for peak detection [1], a crucial step for the data analysis, but performance and outcome can differ widely depending on both algorithmic approach and data acquisition method. This makes it difficult to compare and contrast between algorithms without extensive manual intervention.We present a workflow for improved peak picking (WiPP), a parameter optimizing, multi-algorithm peak detection workflow for GC-MS metabolomics, which automatically evaluates the quality of detected peaks using machine learning-based classification. First, the classifier is trained to distinguish between real compound related peaks and false positive peaks. Then the algorithm parameters are scored based on the quality of detected peaks and optimized accordingly. This procedure is repeated for two peak detection algorithms and subsequently both algorithms are run in parallel on the entire data set with the optimized parameters. The qualitative information returned by the classifier for every peak is then used to merge individual algorithm results into one final high confidence peak set.Using this approach, we show that automated detection and evaluation of peak quality is improved. The additional quantitative and qualitative information generated by the classifier allows: 1. a novel way to classify peaks based on seven classes and thus objectively to assess their quality 2. impartial performance comparison of different peak picking algorithms 3. automated parameter optimization for each individual peak picking algorithm 4. a final, improved high quality peak list to be generated for statistical or further analyses.It achieves this while minimising the operator-time required by packaging this within a fully automated workflow. The modular design allows extension, adjustment and improvement of the workflow using different or additional peak detection algorithms and classifiers. Importantly, due to the fully automated implementation, the workflow is suitable for large-scale studies.The pipeline supports mzML, mzData and NetCDF formats and is implemented in python using snakemake, a reproducible and scalable workflow management system, it is available on GitHub (https://github.com/bihealth/WiPP).