Summary
The increasing amount of resources available on current GPUs sparked new interest in the problem of sharing its resources by different kernels. While new generations of GPUs support concurrent kernel execution, their scheduling decisions are taken by the hardware at runtime. The hardware decisions, however, heavily depend on the order at which the kernels are submitted to execution. In this work, we propose a novel optimization approach to reorder the kernels invocation focusing on maximizing the resources utilization, improving the average turnaround time. We model the kernel assignments to the hardware resources as a series of knapsack problems and use a dynamic programming approach to solve them. We evaluate our method using kernels with different sizes and resource requirements. Our results show significant gains in the average turnaround time and system throughput compared to the kernels submission implemented in modern GPUs.
GPUs have established a new baseline for power efficiency and computing power, delivering larger bandwidth and more computing units in each new generation. Modern GPUs support the concurrent execution of kernels to maximize resource utilization, allowing other kernels to better exploit idle resources. However, the decision on the simultaneous execution of different kernels is made by the hardware, and sometimes GPUs do not allow the execution of blocks from other kernels, even with the availability of resources. In this work, we present an in-depth study on the simultaneous execution of kernels on the GPU. We present the necessary conditions for executing kernels simultaneously, we define the factors that influence competition, and describe a model that can determine performance degradation. Finally, we validate the model using synthetic and real-world kernels with different computation and memory requirements.
O aumento da quantidade de recursos disponíveis nas GPUs modernas despertou um novo interesse no problema do compartilhamento de seus recursos por diferentes kernels. A nova geração de GPUs permite a execução simultânea de kernels, porém ainda são limitadas ao fato de que decisões de escalonamento são tomadas pelo hardware em tempo de execução. Tais decisões dependem da ordem em que os kernels são submetidos para execução, criando execuções onde a GPU não necessariamente está com a melhor taxa de ocupação. Neste trabalho, apresentamos uma proposta de otimização para reordenar a submissão de kernels com foco em: maximizar a utilização dos recursos e melhorar o turnaround time médio. Modelamos a atribuição de kernels para a GPU como uma série de problemas da mochila e usamos uma abordagem de programação dinâmica para resolvê-los. Avaliamos nossa proposta utilizando kernels com diferentes tamanhos e requisitos de recursos. Nossos resultados mostram ganhos significativos no turnaround time médio e no throughput em comparação com a submissão padrão de kernels implementada em GPUs modernas.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.