Programmers who need high performance currently rely on low-level, architecture-specific programming models (e.g. OpenMP for CMPs, CUDA for GPUs, MPI for clusters). Performance optimization with these frameworks usually requires expertise in the specific programming model and a deep understanding of the target architecture. Domain-specific languages (DSLs) are a promising alternative, allowing compilers to map problem-specific abstractions directly to low-level architecture-specific programming models. However, developing DSLs is difficult, and using multiple DSLs together in a single application is even harder because existing compiled solutions do not compose together. In this paper, we present four new performance-oriented DSLs developed with Delite, an extensible DSL compilation framework. We demonstrate new techniques to compose compiled DSLs embedded in a common backend together in a single program and show that generic optimizations can be applied across the different DSL sections. Our new DSLs are implemented with a small number of reusable components (less than 9 parallel operators total) and still achieve performance up to 125x better than library implementations and at worst within 30% of optimized stand-alone DSLs. The DSLs retain good performance when composed together, and applying cross-DSL optimizations results in up to an additional 1.82x improvement.
We study two models of connected pendulum clocks synchronizing their oscillations, a phenomenon originally observed by Huygens. The oscillation angles are assumed to be small so that the pendulums are modeled by harmonic oscillators, clock escapements are modeled by the van der Pol terms. The mass ratio of the pendulum bobs to their casings is taken as a small parameter. Analytic conditions for existence and stability of synchronization regimes, and analytic expressions for their stable amplitudes and period corrections are derived using the Poincaré theorem on existence of periodic solutions in autonomous quasi-linear systems. The anti-phase regime always exists and is stable under variation of the system parameters. The in-phase regime may exist and be stable, exist and be unstable, or not exist at all depending on parameter values. As the damping in the frame connecting the clocks is increased the in-phase stable amplitude and period are decreasing until the regime first destabilizes and then disappears. The results are most complete for the traditional three degrees of freedom model, where the clock casings and the frame are consolidated into a single mass.
Arbitrary program extension at run time in language-based VMs, e.g., Java's dynamic class loading, comes at a startup cost: high memory footprint and slow warmup. Cloud computing amplifies the startup overhead. Microservices and serverless cloud functions lead to small, self-contained applications that are started often. Slow startup and high memory footprint directly affect the cloud hosting costs, and slow startup can also break service-level agreements. Many applications are limited to a prescribed set of pre-tested classes, i.e., use a closed-world assumption at deployment time. For such Java applications, GraalVM Native Image offers fast startup and stable performance. GraalVM Native Image uses a novel iterative application of points-to analysis and heap snapshotting, followed by ahead-of-time compilation with an optimizing compiler. Initialization code can run at build time, i.e., executables can be tailored to a particular application configuration. Execution at run time starts with a pre-populated heap, leveraging copy-on-write memory sharing. We show that this approach improves the startup performance by up to two orders of magnitude compared to the Java HotSpot VM, while preserving peak performance. This allows Java applications to have a better startup performance than Go applications and the V8 JavaScript VM. CCS Concepts: • Software and its engineering → Runtime environments.
Deeply embedded domain-specific languages (EDSLs) intrinsically compromise programmer experience for improved program performance. Shallow EDSLs complement them by trading program performance for good programmer experience. We present Yin-Yang, a framework for DSL embedding that uses Scala macros to reliably translate shallow EDSL programs to the corresponding deep EDSL programs. The translation allows program prototyping and development in the user friendly shallow embedding, while the corresponding deep embedding is used where performance is important. The reliability of the translation completely conceals the deep embedding from the user. For the DSL author, Yin-Yang automatically generates the deep DSL embeddings from their shallow counterparts by reusing the core translation. This obviates the need for code duplication and leads to reliability by construction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.