Clinical and public health microbiology is increasingly utilising whole genome sequencing (WGS) technology and this has lead to the development of a myriad of analysis tools and bioinformatics pipelines. However, in order to ensure robust methodologies suitable for clinical application of this technology, accurate, reproducible, traceable and benchmarked analysis pipelines are necessary. To date, the approach to benchmarking of these has been largely ad-hoc with new pipelines benchmarked on their own datasets with limited comparisons to previously published pipelines. In this study, Snpdragon, a fast and accurate SNP calling pipeline is introduced. Written in Nextflow, Snpdragon is capable of handling small to very large and incrementally growing datasets. Snpdragon is benchmarked using previously published datasets against six other all-in-one microbial SNP calling pipelines, Lyveset, Lyveset2, Snippy, SPANDx, BactSNP and Nesoni. The effect of dataset choice on performance measures is demonstrated to highlight some of the issues associated with the current benchmarking approaches. The establishment of an agreed upon gold-standard benchmarking process for microbial variant analysis is becoming increasingly important to aid in its robust application, improve transparency of pipeline performance under different settings and direct future improvements and development of new pipelines. Snpdragon is available at https://github.com/FordeGenomics/SNPdragon.