Error diffusion is a popular halftoning algorithm that in its most widely used form, is inherently serial. As a serial algorithm, error diffusion offers limited opportunity for large-scale parallelism. In some implementations, it may also result in excessive bus traffic between the on-chip processor and the off-chip memory used to store the modified continuous-tone image and the halftone image. We introduce a new error diffusion algorithm in which the image is processed in two groups of interlaced blocks. Within each group, the blocks may be processed entirely independently. In the first group, the error diffusion proceeds along an outward spiral from the center of the block. Errors along the boundaries of blocks in the first group are diffused into neighboring blocks in the second group, within which the error diffusion spirals inward. A tone-dependent error diffusion training framework is used to eliminate artifacts associated with the spiral scan paths. We demonstrate image quality that is close to that achieved by conventional line-by-line error diffusion.