Climate model emulators are widely used to generate temperature projections for climate scenarios, including in the recent Intergovernmental Panel on Climate Change Sixth Assessment Report. Here we evaluate the performance of a two‐layer energy balance model in emulating historical and future temperature projections from Coupled Model Intercomparison Project Phase 6 models. We find that emulation errors can be large (>0.5°C for SSP2‐4.5) and differ markedly between climate models, forcing scenarios and time periods. Errors arise in emulating the near‐surface temperature response to both greenhouse gas and aerosol forcing; in some periods the errors due to these forcings oppose one another, giving the spurious impression of better emulator performance. Climate feedbacks are assumed constant in the emulator, introducing time‐varying or state dependent feedbacks may reduce prediction errors. Close emulations can be produced for a given period but, crucially, this does not guarantee reliable emulations of other scenarios and periods. Therefore, rigorous out‐of‐sample evaluation is necessary to characterize emulator performance.