This reprint differs from the original in pagination and typographic detail. 1 2 BRAY, WONG, BARR AND SCHOENBERG project was initiated to create a variety of earthquake forecast models for seismic hazard assessment in California. Unlike previous projects that addressed the assessment of models for seismic hazard, the RELM participants decided to adopt many competing forecasting models and to rigorously and prospectively test their performance in a dedicated testing center ]. With the end of the RELM project, the forecast models became available and the development of the testing center was done within the scope of CSEP. Many point process models, including multiple variants of the Epidemic-Type Aftershock Sequence (ETAS) models of Ogata (1998) have now been proposed and are part of RELM and CSEP, though the problem of how to compare and evaluate the goodness of fit of such models remains quite open. In RELM, a community consensus was reached that all entered models be tested with certain tests, including the Number or N-test that compares the total forecasted rate with the observation, the Likelihood or L-test that assesses the quality of a forecast in terms of overall likelihood, and the Likelihood-Ratio or R-test that assesses the relative performance of two forecast models compared with what is expected under one proposed model ]. However, over time several drawbacks of these tests were discovered [Schorlemmer et al. (2010)] and the need for more powerful tests became clear. The N-test and L-test simply compare the quantiles of the total numbers of events in each bin or likelihood within each bin to those expected under the given model, and the resulting low-power tests are typically unable to discern significant lack of fit unless the overall rate of the model fits extremely poorly. Further, even when the tests do reject a model, they do not typically indicate where or when the model fits poorly, or how it could be improved. Meanwhile, the number of proposed spatial-temporal models for earthquake occurrences has grown, and the need for discriminating which models fit better than others has become increasingly important. Techniques for assessing goodness of fit are needed to pinpoint where existing models may be improved, and residual plots, rather than numerical significance tests, seem preferable for these purposes.This paper proposes a new form of residual analysis for assessing the goodness of fit of spatial point process models. The proposed method compares the normalized observed and expected numbers of points over Voronoi cells generated by the observed point pattern. The method is applied here in particular to the examination of a version of the ETAS model originally proposed by Ogata (1998), and its goodness of fit to a sequence of 520 M ≥ 3 Hector Mine earthquakes occurring between October 1999 and December 2000. In particular, the Voronoi residuals indicate that assumption of a constant background rate ρ in the ETAS model results in excessive smoothing