Aims: The Drag Temperature Model (DTM) is a semi-empirical model describing the temperature, density, and composition of the Earth's thermosphere. DTM2009 and DTM2000, and the COSPAR reference models NRLMSISE-00 and JB2008, are evaluated in order to establish benchmark values for new DTM models that will be developed in the framework of the Advanced Thermosphere Modelling and Orbit Prediction (ATMOP) project. Methods: The total density data used in this study, including the high-resolution CHAMP and GRACE data, cover the 200-1000 km altitude range and all solar activities. DTM2009, using an improved DTM2000 algorithm, was constructed with most data assimilated in DTM2000, but also with CHAMP and GRACE data. The bias and precision of the four models is evaluated by comparing to the observations according to a metric, which consists of computing mean, RMS, and correlation. Secondly, the residuals are binned, which procedure aids in revealing specific model errors. Results: This evaluation shows that DTM2009 is the most precise model for the data that were assimilated. Comparison to independent density data shows that it is also the most accurate model overall and a significant improvement over DTM2000 under all conditions. JB2008 is the most accurate model below 300 km, JB2008 and DTM2009 perform best in the 300-500 km altitude range, whereas above 500 km NRLMSISE-00 and DTM2009 are most accurate. The precision of JB2008 decreases with altitude, which is due to its modeling of variations in local solar time and seasons in particular of the exospheric temperature rather than modeling these variations for the individual constituents. Specific errors in DTM2009, for example related to the employed solar activity proxy, will be fixed in the next model release, DTM2012. A specific analysis under geomagnetic storm conditions is outside the scope of the present paper.