In this study, we introduce a methodology for automatically transforming user applications written in C/C++ to a parallel representation consisting of coarse-grained tasks based on dynamic profiling. Such a parallel representation is suitable for mapping applications onto heterogeneous SoCs. We present our approach for instrumenting the user application binary during the compilation process with parallel primitives that enable the runtime system to schedule and execute independent computation-intensive coarse-grained tasks concurrently. We use the proposed compilation and code transformation methodology to retarget each application for execution on a heterogeneous SoC composed of processor cores and accelerators. We demonstrate the capabilities of our integrated compile time and runtime flow through task-level parallelization and functionally correct execution of real-world applications in the communication systems and radar processing domains. We demonstrate the functionality of our integrated system by executing six distinct applications with different degrees of parallelism on four different platforms: an eight-core general-purpose processor, a heterogeneous SoC simulator, and two heterogeneous SoCs utilizing the Xilinx Zynq UltraScale+ FPGA and the Nvidia Jetson AGX board. Our integrated approach offers a path forward for application developers to take full advantage of the target SoC without requiring users to become hardware or parallel programming experts.