Recent large earthquakes that caused great damage in areas predicted to be relatively safe, illustrate the importance of criteria that assess how well earthquake hazard maps used to develop codes for earthquake-resistant construction are actually performing. At present, there is no agreed-upon way of assessing how well a map performed and thus determining whether one map performed better than another. The fractional site exceedance metric implicit in current maps, that during the chosen time interval the predicted ground motion will be exceeded only at a specific fraction of the sites, is useful but permits maps to be nominally successful even if they significantly underpredict or overpredict shaking, or permits them to be nominally unsuccessful but do well in terms of predicting shaking. We explore some possible metrics that better measure the effects of overprediction and underprediction and can be weighted to reflect the two differently and to reflect differences in populations and property at risk. Although no single metric alone fully characterizes map behavior, using several metrics can provide useful insight for comparing and improving hazard maps. For example, both probabilistic and deterministic hazard maps for Italy dramatically overpredict the recorded shaking in a 2200-yr-long historical intensity catalog, illustrating problems in the data (most likely), models, or both.