This thesis describes a parallel implementation of the timing-driven VPR 5.0 simulated-annealing placement engine. By partitioning the grid into regions and allowing distant data to grow stale, it is possible to consider a large number of nonconflicting moves in parallel and achieve a deterministic result. The full timingdriven placement algorithm is parallelized, including swap evaluation, boundingbox calculation and the detailed timing-analysis updates. The partitioned region approach slightly degrades the placement quality, but this is necessary to expose greater parallelism. We also suggest a method to recover the lost quality.In simulated annealing, runtime can be shortened at the expense of quality.Using this method, the serial placer can achieve a maximum speedup of 100X while quality metrics degrades as much as 100%. In contrast, the parallel placer can scale beyond 500X with all quality metrics degrading by less than 30%. Specifically, at the point where the parallel placer begins to dominate over the serial placer, the post-routing minimum channel width, wirelength and critical-path delay degrades 13%, 10% and 7% respectively on average compared to VPR's original algorithm, while achieving a 140X to 200X speedup 25 threads. Finally, it is shown that the amount of degradation in the parallel placer is independent of the number of threads used.ii
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.