Coarse-grained reconfigurable architectures (CGRA) are a power-efficient approach for hardware accelerators. However, there are few EDA tools for CGRA. We develop hardware-based placement and routing (P&R) for fully-pipelined CGRA mapped as an FPGA overlay. The key idea is to use the available FPGA resources to replicate several mapping units, thus exploring parallel execution, area/execution time trade-offs, and achieving near-optimal mapping solutions. Furthermore, our P&R provides portability and an incremental run-time approach. In comparison to VPR and CGRA-ME tools and a time-multiplexer approach, our spatial mapping reduces the P&R execution time, and it improves the performance up to hundreds of Gops/s by using fully-pipelined architectures.