This work explores the placement and routing of machine learning applications' dataflow graphs on different heterogeneous coarse‐grained reconfigurable architectures (CGRA). We analyze three different types of processing element (PE) heterogeneity, the first concerning the interconnection pattern, the second being on the kind of operations a single PE can execute, and the last concerning the PE buffer resources. This analysis aim to propose a fair reduction to the overall cost in comparison to the homogeneous CGRA architecture. We compare our results with the homogeneous case and one of the state‐of‐the‐art tools for placement and routing (P&R). Our algorithm executed, on average, 52% faster than VPR 8.1 (Versatile Place and Route), which is an open‐source academic tool designed for the FPGA placement and routing phases, reaching better mapping in 66% of cases and achieving the same results in 26% of cases. Furthermore, a heterogeneous architecture reduces the cost without losing performance in 76% of the cases considering multiplier heterogeneity. We propose a novel heterogeneous buffer architecture that minimizes the buffer resources by 56.3% for K‐means dataflow patterns. We also show that a heterogeneous border chess architecture outperforms a homogeneous one. In addition, our mapping reaches optimal instances of single tree dataflows compared to classical Lee/Choi and H‐trees.