Coarse-grained reconfigurable architecture (CGRA) is a promising architecture with high performance, high power efficiency, and attraction of flexibility. The computation-intensive portions of applications, i.e., loops, are often implemented on CGRAs for acceleration. The loop pipelining techniques are usually used to exploit the parallelism of loops. However, for nested loops, the existing loop pipelining methods often result in poor hardware utilization and low execution performance. To tackle this problem, this paper makes three contributions: 1) we propose the use of affine transformation to facilitate nested loop pipelining; 2) based on polyhedral model, we present a precise and general formulation of the nested loop pipelining problem on a CGRA; and 3) using the insights from problem formulation, we design a joint affine transformation and multipipeline merging approach to improve the performance of nested loop on CGRA. The experimental results show that our approach can improve the performance of nested loops up to 35% on average, compared with the state-of-the-art techniques.