Abstract-We consider the placement problem as part of the CAD flow for a massively parallel processor arrays (MPPAs). In contrast to traditional placers, which operate on a workstation with one or several cores and are able to take advantage of parallelism to a limited degree, we investigate running the placer on the target architecture itself. As the number of processor elements (PEs) in such a device scale, so too does the computational power available to the placer. This natural scaling helps avoid the long runtimes that afflict FPGA flows.In this paper, we propose a distributed placer suitable to run on a MPPA. This placer takes advantage of local interconnect fabric, and may be efficiently coded on a simple, RISC-like core. We investigate the performance of this placer and compare it to traditional, simulated annealing-based placers using both unrealistic (but nearly optimal) and realistic (but suboptimal) annealing schedules.On a simulated 32 × 32 = 1024-core MPPA, the proposed algorithm furnishes placements within 5% of the optimal placement quality -a level competetive with the realistic, traditional placer. To do so, the distributed placer requires each PE to consider 1/256 th as many swaps as the traditional placer, a computational advantage which scales favourably as the number of cores on the MPPA increases.