ARM 1 AbstractParallelization has been used to maintain a reasonable balance between energy consumption and performance in computing platforms especially in modern multi-and many-core systems. This paper studies the interplay between performance and energy, and their relationships with parallelization scaling in the context of the reliable operating region, focusing on the effectiveness of parallelization scaling in throughput-power tradeoffs. Theoretical and experimental explorations show that a meaningful cross-platform analysis of this interplay can be achieved using the proposed method of binormalization of the ROR. The concept of this interplay is captured in an online tool for finding optimal operating points.
IntroductionIn digital CMOS circuits, a higher supply voltage (called henceforth) usually permits a higher operating (clock) frequency for capacitive load-balancing, and hence a higher throughput, given the same hardware platform. The scheme of dynamic voltage and frequency scaling (DVFS) scales and clock frequency (henceforth called ) together in order to obtain the best throughput under a given power budget or to save power for a given throughput requirement [1].It is possible to increase system throughput for a given power limit, or to reduce power whilst maintaining throughput, by combining DVFS with parallelization or scaling to multiple computation units if the computation can be parallelized [2]. A major challenge for the precise analysis of the effectiveness of using parallelization for these goals is to determine the parallelizability of any particular execution, which is related to complex issues such as software and hardware architecture details and must be modelled on a per-execution basis [3]. Another challenge is that quantitative studies of power and/or throughput improvements for any DVFS decision need complicated executiondependent models [4]. This paper explores the interplay between DVFS and parallelization scalability with respect to performance and power. The interplay is captured using the concept of a reliable operating region (ROR), which can be established from the knowledge of system reliability through experiments or simulations. The ROR therefore provides containment for platform and application specifics, hence helping to make the further analysis steps generic.The focus of this paper is the effectiveness of parallelization scaling, the latter denoted as .The ROR-based method can explore across the entire voltage range of a platform, from subthreshold to super-threshold regions. The explorations and models presented in this paper confirm and explain the general view that combined DVFS and parallelization scaling produces the best advantage when is scaled down to near-threshold voltages. This is known as near-threshold