In light of recent hardware advances, general-purpose computing on graphics processing units (GPGPU) is becoming increasingly commonplace, and needs novel programming models due to GPUs' radically different architecture. For the most part, existing approaches to programming GPUs within a high-level programming language choose to embed a domain-specific language (DSL) within a host metalanguage and then implement a compiler that maps programs written within that DSL to code in low-level languages such as OpenCL or CUDA. An alternative, underexplored, approach is to compile a restricted subset of the host language itself directly down to OpenCL/CUDA. We believe more research should be done to compare these two approaches and their relative merits. As a step in this direction, we implemented a quick proof of concept of the alternative approach. Specifically, we extend the Repa library with a computeG function to offload a computation to the GPU. As long as the requested computation meets certain restrictions, we compile it to OpenCL 2.0 using the recently added feature for shared virtual memory. We can successfully run nine benchmarks on an Intel integrated GPU. We obtain the expected performance from the GPU on six of those benchmarks, and are close to the expected performance on two more. In this paper, we describe an offload primitive for Haskell, how to extend Repa to use it, how to implement that primitive in the Intel Labs Haskell Research Compiler, and evaluate the approach on nine benchmarks, comparing to two different CPUs, and for one benchmark to handwritten OpenCL code.