The SEJITS framework supports creating embedded domainspecific languages (DSELs) and code generators, a pair of which is called a specializer, with much less effort than creating a full DSL compiler-typically just a few hundred lines of code. SE-JITS' main benefit is allowing application writers to stay entirely in high-level languages such as Python by using specialized Python functions (that is, functions written in one of the Python-embedded DSELs) to generate code that runs at native speed. One existing SEJITS DSEL is Sepya [10], a Python DSEL for stencil computations that generates OpenMP and Cilk+ code competitive with existing DSL compilers such as Pochoir and Halide. We extend Sepya to generate OpenCL code for targetting GPUs, and in the process, extend SEJITS with support for meta-specializers, whose job is to enable and optimize the composition of existing specializers written by third parties. In this work, we demonstrate metaspecialization by detecting and removing extraneous data copies to and from the GPU to compose multiple specializer calls (stencil and non-stencil). We also explore the variants of loop fusion to further improve performance of composing these operations. The performance of the generated stencil code is 20× faster SciPy and competitive with existing stencil DSELs on realistic code excerpts. Since meta-specializers must compose and optimize specializers created by third parties, we extend SEJITS with support for metaspecializer hooks, allowing existing specializers to be incrementally enabled for meta-specialization without breaking backwards compatibility. The Sepya and SEJITS extensions together extend the range of platforms for which highly optimized code can be generated and open new possibilities for optimizing the composition of existing specializers.Scientific application authors face a trade off between productivity-being able to code in a natural language whose abstractions match their own problem domains-and high performance. In some simple cases, high-performance libraries bridge this gap (BLAS, OSKI), but for more sophisticated computations such as stencils, different problem instances share common structure and abstractions, but the computations are too different to encapsulate using libraries alone. This observation has led to other approaches such as the use of domain-specific languages (DSLs) for stencils. By focusing on expressing what computation is to be done rather than how to do it, DSL compilers can often apply optimizations specific to both the computation and the target hardware whose applicability would be difficult to infer from imperative code expressing lower-level operations. In particular, embedded DSLs (DSELs) have the additional advantage [7] that programs can take advantage of most or all features of the embedding language, and that complete programs in the embedding language can include "subprograms" in multiple distinct embedded languages.The SEJITS framework (SEJITS.org) provides the infrastructure for creating DSELS embedded in Python. The corres...