A benchmark was performed, comparing the results of three different methodologies proposed by three institutions to calibrate a network of low-cost PM2.5 sensors, on an hourly basis, using synthetically generated real concentrations and sensor measurements. The objective of the network calibrations was to correct the 2000+ sensor measurements in the Netherlands for the sensitivity to (local) environmental conditions. The option to use real measurements was dropped because the number of low-cost sensors sufficiently close to the 40 reference measurement locations was assessed to be spatially insufficient to benchmark the proposed approaches. Instead, synthetic real concentrations were generated to enable validation at all sensor locations. Hourly actual sensor and actual fixed concentrations, as well as interpolated concentration maps, were used as underlying data to generate the synthetic data sets for the period of 1 month. The synthetic sensor measurement errors were constructed by sampling from a collection of differences between actual sensor values and actual measurements. Of the three tested calibration methods, two follow a similar approach, although having differences in, e.g., outlier analyses and method of grouping sensors, leading also to comparable corrections to the raw sensor measurements. A third method uses significantly stricter rules in outlier selection, discarding considerably more sensors because of insufficient quality. Differences between the methods become most apparent when analyzing data at a smaller time scale. It is shown that two network calibration methods are better at correcting the hourly/daily bias.