S U M M A R YSensitivity analysis with synthetic models is widely used in seismic tomography as a means for assessing the spatial resolution of solutions produced by, in most cases, linear or iterative nonlinear inversion schemes. The most common type of synthetic reconstruction test is the so-called checkerboard resolution test in which the synthetic model comprises an alternating pattern of higher and lower wave speed (or some other seismic property such as attenuation) in 2-D or 3-D. Although originally introduced for application to large inverse problems for which formal resolution and covariance could not be computed, these tests have achieved popularity, even when resolution and covariance can be computed, by virtue of being simple to implement and providing rapid and intuitive insight into the reliability of the recovered model. However, checkerboard tests have a number of potential drawbacks, including (1) only providing indirect evidence of quantitative measures of reliability such as resolution and uncertainty, (2) giving a potentially misleading impression of the range of scale-lengths that can be resolved, and (3) not giving a true picture of the structural distortion or smearing that can be caused by the data coverage. The widespread use of synthetic reconstruction tests in seismic tomography is likely to continue for some time yet, so it is important to implement best practice where possible. The goal of this paper is to develop the underlying theory and carry out a series of numerical experiments in order to establish best practice and identify some common pitfalls. Based on our findings, we recommend (1) the use of a discrete spike test involving a sparse distribution of spikes, rather than the use of the conventional tightly spaced checkerboard; (2) using data coverage (e.g. ray-path geometry) inherited from the model constrained by the observations (i.e. the same forward operator or matrix), rather than the data coverage obtained by solving the forward problem through the synthetic model; (3) carrying out multiple tests using structures of different scale length; (4) taking special care with regard to what can be inferred when using synthetic structures that closely mimic what has been recovered in the observation-based model; (5) investigating the range of structural wavelengths that can be recovered using realistic levels of imposed data noise; and (6) where feasible, assessing the influence of model parametrization error, which arises from making a choice as to how structure is to be represented.