The purpose of this study was to evaluate the historical skill of models in the Coupled Model Intercomparison Project Phase 5 (CMIP5) in two regions of Ethiopia: northwestern Ethiopia and the Awash, one of the main Ethiopian river basins. An ensemble of CMIP5 models was first selected so that atmosphere-only (Atmospheric Model Intercomparison Project, AMIP) and fully coupled simulations could be directly compared, assessing the effects of coupled model sea surface temperature (SST) biases. The annual cycle, seasonal biases, trends, and variability were used as metrics of model skill. In the Awash basin, both coupled and AMIP simulations had late Belg or March-May (MAM) rainy seasons. In connection to this, most models also missed the June rainfall minimum entirely. Northwest Ethiopia, which has a unimodal rainfall cycle in observations, is shown to have bimodal seasonality in models, even in the AMIP simulations. Significant AMIP biases in these regions show that model biases are not related to SST biases alone. Similarly, a clear connection between model resolution and skill was not found. Models simulated temperature with more skill than rainfall, but trends showed an underestimation in Belg (MAM/April-May (AM)) trends, and an overestimation in Kiremt or July-September (JAS/June-September (JJAS)) trends. The models which were shown to have the most skill in a range of categories were HadGEM2-AO, GFDL-CM3, and MPI-ESM-MR. The biases and discrepancies in model skill for different metrics of rainfall and temperature found in this study provide a useful basis for a process-based analysis of the CMIP5 ensemble in Ethiopia.