Chemical multisensor devices need calibration algorithms to estimate gas concentrations. Their possible adoption as indicative air quality measurements devices poses new challenges due to the need to operate in continuous monitoring modes in uncontrolled environments. Several issues, including slow dynamics, continue to affect their real world performances. At the same time, the need for estimating pollutant concentrations on board the devices, especially for wearables and IoT deployments, is becoming highly desirable. In this framework, several calibration approaches have been proposed and tested on a variety of proprietary devices and datasets; still, no thorough comparison is available to researchers. This work attempts a benchmarking of the most promising calibration algorithms according to recent literature with a focus on machine learning approaches. We test the techniques against absolute and dynamic performances, generalization capabilities and computational/storage needs using three different datasets sharing continuous monitoring operation methodology. Our results can guide researchers and engineers in the choice of optimal strategy. They show that non-linear multivariate techniques yield reproducible results, outperforming linear approaches. Specifically, the Support Vector Regression method consistently shows good performances in all the considered scenarios. We highlight the enhanced suitability of shallow neural networks in a trade-off between performance and computational/storage needs. We confirm, on a much wider basis, the advantages of dynamic approaches with respect to static ones that only rely on instantaneous sensor array response. The latter have been shown to be best choice whenever prompt and precise response is needed.