Heterogeneous systems consisting of multiple multi-core CPUs and many-core accelerators have recently come into wide use, and more and more parallel applications are developed in such a heterogeneous system. To fully utilize multiple compute devices to cooperatively and concurrently execute data-parallel kernels on heterogeneous systems, a feedback-based dynamic and elastic task scheduling scheme is proposed, which can provide a better load balance, a greater device utilization, and a lower scheduling overhead by flexibly and dynamically adjusting the workload between devices during execution. The proposed method is more suitable for data-parallel kernels whose computation and data are uniformly distributed, but is less suitable for data-parallel kernels whose computation and data are non-uniformly distributed. Thus, an asynchronous-based dynamic and elastic task scheduling scheme is proposed, which can avoid device underutilization, load imbalance across devices, and frequent kernel launches, interdevice data transfers and inter-device synchronizations by dynamically adjusting the chunk size according to the performance change during runtime. A series of experiments are conducted with 8 representative parallel applications on a hybrid CPU-GPU-MIC system, the results show that the proposed two interdevice task scheduling schemes can achieve the efficient CPU-GPU-MIC co-processing of different parallel applications by effectively partitioning work across devices.INDEX TERMS Data-parallel kernels, heterogeneous systems, many-core accelerators, multi-core CPUs, multi-device co-processing, parallel applications, task scheduling.