“…Due to the threading overhead and irregular computational patterns of STA, performance of CPU-based multi-threading usually saturates at around 8-16 threads [4,8]. To break the performance bottleneck, GPU acceleration for timing analysis is further explored [8,17]. Wang et al [17] proposed acceleration techniques for the look-up table interpolation when computing the cell delays during the timing propagation, while the other steps like net delay and levelization are still on CPU.…”