C/C++/OpenCL-based high-level synthesis (HLS) becomes more and more popular for eld-programmable gate array (FPGA) accelerators in many application domains in recent years, thanks to its competitive quality of result (QoR) and short development cycle compared with the traditional register-transfer level (RTL) design approach. Yet, limited by the sequential C semantics, it remains challenging to adopt the same highly productive high-level programming approach in many other application domains, where coarse-grained tasks run in parallel and communicate with each other at a ne-grained level. While current HLS tools support taskparallel programs, the productivity is greatly limited in the code development, correctness veri cation, and QoR tuning cycles, due to the poor programmability, restricted so ware simulation, and slow code generation, respectively. Such limited productivity o en defeats the purpose of HLS and hinder programmers from adopting HLS for task-parallel FPGA accelerators.In this paper, we extend the HLS C++ language and present a fully automated framework with programmer-friendly interfaces, universal so ware simulation, and fast code generation to overcome these limitations. Experimental results based on a wide range of real-world task-parallel programs show that, on average, the lines of kernel and host code are reduced by 22% and 51%, respectively, which considerably improves the programmability. e correctness veri cation and the iterative QoR tuning cycles are both greatly accelerated by 3.2× and 6.8×, respectively.