packets is bounded by d, the authors have presented in [6] a d ϩ O(d/f (d)) step, O( f (d)) buffer size routing algorithm which is asymptotically optimal if f (d) is chosen to be a large constant. In our study, we assume all the processors operate in synchronous MIMD mode. At any time step, each processor can communicate with all of its grid neighbors and can both send and receive one packet along each mesh link. In addition, processors can also store packets in their own queues. This model (hereafter referred to as the base model) is the same as the ones used in [9, 10, 12-16, 22, 25].The main disadvantage of the mesh topology is its large diameter, which has direct impact on the communication times of many parallel algorithms. Augmenting arrays of processors with various faster mechanisms has been suggested as a means to speed up communication among the processors. Examples are meshes with multiple buses which have a bus in each column and each row [1,20], generalized meshes with multiple buses which are composed of smaller meshes with multiple buses [8], meshes with separable row and column buses in which row/column buses can be separated into multiple shorter buses through turning on/off bus switches [24], and reconfigurable meshes in which links can be connected to form buses [2]. These enhanced meshes are capable of solving problems that only require a limited amount of global communication significantly faster. This paper shows how to utilize these buses in some ''high-bandwidth'' routing problems.In all the related bused mesh models, except reconfigurable meshes, broadcast buses are added to the base model. Each broadcast bus is connected to a set of processors. In each time step, only one processor attached to a bus can send a packet via the bus. In addition, a processor can receive packets from all the buses attached to it in a time step. A number of proposed bused mesh models assume the propagation delay of a bus to be a constant which is independent of the number of processors attached to it. This assumption is thought to be a reasonable one in practical situations [1,2,3,20,27]. However, Lu et al. [18] investigated physical implementations of buses and found that short buses and long buses do have a difference in performance and that the constant-delay assumption is more appropriate with short-bus models. In this paper, we assume the propagation delay of the buses to be one time step which is also assumed in [1,8,17,27]. As we will see, JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 33, 84-90 (1996) Routing with locality is studied for meshes with buses. In this problem, packets' distances are bounded by a value, d, which is less than the diameter of the network. This problem arises naturally when specific known algorithms are implemented on meshes. Solving this problem in ordinary meshes requires at least a routing time of d steps. To do better than this, we propose adding a kind of short bus to ordinary meshes. By using a technique which we call iterative walk-and-ride, we show how the routing ...