Spatio-temporal hierarchical modeling is an extremely attractive way to model the spread of crime or terrorism data over a given region, especially when the observations are counts and must be modeled discretely. The spatio-temporal diffusion is placed, as a matter of convenience, in the process model allowing for straightforward estimation of the diffusion parameters through Bayesian techniques. However, this method of modeling does not allow for the existence of self-excitation, or a temporal data model dependency, that has been shown to exist in criminal and terrorism data. In this manuscript we will use existing theories on how violence spreads to create models that allow for both spatio-temporal diffusion in the process model as well as temporal diffusion, or self-excitation, in the data model. We will further demonstrate how Laplace approximations similar to their use in Integrated Nested Laplace Approximation can be used to quickly and accurately conduct inference of self-exciting spatiotemporal models allowing practitioners a new way of fitting and comparing multiple process models. We will illustrate this approach by fitting a self-exciting spatio-temporal model to terrorism data in Iraq and demonstrate how choice of process model leads to differing conclusions on the existence of self-excitation in the data and differing conclusions on how violence spread spatially-temporally in that country from 2003-2010. 1. Introduction. A typical spatio-temporal model consists of three levels, a data model, a process model, and a parameter model. A common way to model data then is to assume Y (·), is conditionally independent given the process model X(·). For example, if observations take place at aerial regions, s i , at discrete time periods, t, and Y (s i , t) are counts, a common model is Y (s i , t)|X(s i , t) ∼ Pois(exp(X(s i , t))). The spatio-temporal diffusion structure is commonly then placed on the process model which commonly is assumed to have a Gaussian joint distribution of X ∼ Gaus(0, Q −1 (θ)). The majority of analysis of these models is done using Bayesian techniques requiring a further parameter model for θ. The challenge in these models