Context. The evolution of massive stars is not fully understood. The relation between different types of evolved massive stars is not clear, and the role of factors such as binarity, rotation or magnetism needs to be quantified. Aims. Several groups make available the results of 1D single stellar evolution calculations in the form of evolutionary tracks and isochrones. They use different stellar evolution codes for which the input physics and its implementation varies. In this paper, we aim at comparing the currently available evolutionary tracks for massive stars. We focus on calculations aiming at reproducing the evolution of Galactic stars. Our main goal is to highlight the uncertainties on the predicted evolutionary paths. Methods. We compute stellar evolution models with the codes MESA and STAREVOL. We compare our results with those of four published grids of massive stellar evolution models (Geneva, STERN, Padova and FRANEC codes). We first investigate the effects of overshooting, mass loss, metallicity, chemical composition. We subsequently focus on rotation. Finally, we compare the predictions of published evolutionary models with the observed properties of a large sample of Galactic stars. Results. We find that all models agree well for the main sequence evolution. Large differences in luminosity and temperatures appear for the post main sequence evolution, especially in the cool part of the Hertzsprung-Russell (HR) diagram. Depending on the physical ingredients, tracks of different initial masses can overlap, rendering any mass estimate doubtful. For masses between 7 and 20 M , we find that the main sequence width is slightly too narrow in the Geneva models including rotation. It is (much) too wide for the (STERN) FRANEC models. This conclusion is reached from the investigation of the HR diagram and from the evolution of the surface velocity as a function of surface gravity. An overshooting parameter α between 0.1 and 0.2 in models with rotation is preferred to reproduce the main sequence width. Determinations of surface abundances of carbon and nitrogen are partly inconsistent and cannot be used at present to discriminate between the predictions of published tracks. For stars with initial masses larger than about 60 M , the FRANEC models with rotation can reproduce the observations of luminous O supergiants and WNh stars, while the Geneva models remain too hot.