A power-saving approach for real-time systems that combines processor voltage scaling and task placement in hybrid memory is presented. The proposed approach incorporates the task’s memory placement problem between the DRAM (dynamic random access memory) and NVRAM (nonvolatile random access memory) into the task model of the processor’s voltage scaling and adopts power-saving techniques for processor and memory selectively without violating the deadline constraints. Unlike previous work, our model tightly evaluates the worst-case execution time of a task, considering the time delay that may overlap between the processor and memory, thereby reducing the power consumption of real-time systems by 18–88%.
Rendering is the process of generating high-resolution images by software, which is widely used in animation, video games and visual effects in movies. Although rendering is a computation-intensive job, we observe that storage accesses may become another performance bottleneck in desktop-rendering systems. In this article, we present a new buffer cache management scheme specialized for rendering systems. Unlike general-purpose computing systems, rendering systems exhibit specific file access patterns, and we show that this results in significant performance degradation in the buffer cache system. To cope with this situation, we collect various file input/output (I/O) traces of rendering workloads and analyze their access patterns. The results of this analysis show that file I/Os in rendering processes consist of long loops for configuration, short loops for texture input, random reads for input, and single-writes for output. Based on this observation, we propose a new buffer cache management scheme for improving the storage performance of rendering systems. Experimental results show that the proposed scheme improves the storage I/O performance by an average of 19% and a maximum of 55% compared to the conventional buffer cache system.
GPGPU (General-Purpose Graphics Processing Unit) consists of hardware resources that can execute tens of thousands of threads simultaneously. However, in reality, the parallelism is limited as resource allocation is performed by the base unit called thread block, which is not managed judiciously in the current GPGPU systems. To schedule threads in GPGPU, a specialized hardware scheduler allocates thread blocks to the computing unit called SM (Stream Multiprocessors) in a Round-Robin manner. Although scheduling in hardware is simple and fast, we observe that the Round-Robin scheduling is not efficient in GPGPU, as it does not consider the workload characteristics of threads and the resource balance among SMs. In this article, we present a new thread block scheduling model that has the ability of analyzing and quantifying the performances of thread block scheduling. We implement our model as a GPGPU scheduling simulator and show that the conventional thread block scheduling provided in GPGPU hardware does not perform well as the workload becomes heavy. Specifically, we observe that the performance degradation of Round-Robin can be eliminated by adopting DFA (Depth First Allocation), which is simple but scalable. Moreover, as our simulator consists of modular forms based on the framework and we publicly open it for other researchers to use, various scheduling policies can be incorporated into our simulator for evaluating the performance of GPGPU schedulers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.