Programming on heterogeneous hardware architectures using OpenCL requires thorough knowledge of the hardware. Many High-Performance Domain-Specific Languages (HPDSLs) are aimed at simplifying the programming efforts by abstracting away hardware details, allowing users to program in a sequential style. However, most HPDSLs still require the users to manually map compute workloads to the best suitable hardware to achieve optimal performance. This again calls for knowledge of the underlying hardware and trial-and-error attempts. Further, very often they only consider an offloading mode where computeintensive tasks are offloaded to accelerators. During this offloading period, CPUs remain idle, leaving parts of the available computational power untapped. In this work, we propose a tool named OptCL for existing HPDSLs to enable a heterogeneous co-execution mode when capable where CPUs and accelerators can process data simultaneously. Through a static analysis of data dependencies among compute-intensive code regions and performance predictions, the tool selects the best execution schemes out of purely CPU/accelerator execution or co-execution. We show that by enabling co-execution on dedicated and integrated CPU-GPU systems up to 13× and 21× speed-ups can be achieved.