Abstract-The increased complexity of programming heterogeneous reconfigurable platforms requires a thorough understanding of application behavior, for which developers need sophisticated analysis tools. One particular problem, which severely limits the performance gain of running applications on these platforms, is the inappropriateness of the kernels mapped onto the reconfigurable fabrics. Efficient porting of legacy applications to these emerging heterogeneous platforms demands code tuning considering several critical points, such as, proper kernel size and small memory communication overhead. Detailed profiling information is thus vital for an efficient HW/SW co-design. To facilitate addressing these issues, we developed the Q 2 profiling framework. It consists of two parts: an advanced memory access profiling toolset that provides detailed information on the run-time memory access patterns of an application and a statistical modeling framework that makes predictions for resources, early in the design phase, based on software metrics. The code optimizations triggered by careful analysis of the profiling information is used to tailor existing applications for heterogeneous reconfigurable platforms. In this paper, we examine a real application in detail to show the potential of the proposed profiling framework. Experimental results show that a speedup of 1.3× is achieved by accelerating a merged kernel of four critical functions in the application.