Context. Photometric redshifts (photo-z's) have become an essential tool in extragalactic astronomy. Many current and upcoming observing programmes require great accuracy of photo-z's to reach their scientific goals. Aims. Here we introduce PHAT, the PHoto-z Accuracy Testing programme, an international initiative to test and compare different methods of photo-z estimation. Methods. Two different test environments are set up, one (PHAT0) based on simulations to test the basic functionality of the different photo-z codes, and another one (PHAT1) based on data from the GOODS survey including 18-band photometry and ∼2000 spectroscopic redshifts.Results. The accuracy of the different methods is expressed and ranked by the global photo-z bias, scatter, and outlier rates. While most methods agree very well on PHAT0 there are differences in the handling of the Lyman-α forest for higher redshifts. Furthermore, different methods produce photo-z scatters that can differ by up to a factor of two even in this idealised case. A larger spread in accuracy is found for PHAT1. Few methods benefit from the addition of mid-IR photometry. The accuracy of the other methods is unaffected or suffers when IRAC data are included. Remaining biases and systematic effects can be explained by shortcomings in the different template sets (especially in the mid-IR) and the use of priors on the one hand and an insufficient training set on the other hand. Some strategies to overcome these problems are identified by comparing the methods in detail. Scatters of 4-8% in Δz/(1 + z) were obtained, consistent with other studies. However, somewhat larger outlier rates (>7.5% with Δz/(1 + z) > 0.15; >4.5% after cleaning) are found for all codes that can only partly be explained by AGN or issues in the photometry or the spec-z catalogue. Some outliers were probably missed in comparisons of photo-z's to other, less complete spectroscopic surveys in the past. There is a general trend that empirical codes produce smaller biases than template-based codes. Conclusions. The systematic, quantitative comparison of different photo-z codes presented here is a snapshot of the current state-ofthe-art of photo-z estimation and sets a standard for the assessment of photo-z accuracy in the future. The rather large outlier rates reported here for PHAT1 on real data should be investigated further since they are most probably also present (and possibly hidden) in many other studies. The test data sets are publicly available and can be used to compare new, upcoming methods to established ones and help in guiding future photo-z method development.