Thread parallelism and single-thread' performance are two important factors affecting the performance of kernel functions, and they are both closely related to register allocation. According to change the thread parallelism to optimize GPU register resource allocation can effectively improve the performance of heterogeneous programs. We obtain the required number of vector registers by counting the number of virtual registers during the compilation of kernel functions, and then combine them with the number of wavefronts used to launch kernel functions for overall performance analysis, proposing a RAW compilation method for collaborative optimization of register allocation and thread management for AMDGPU, which is implemented in the LLVM compiler. It is verified that the method has a speedup ratio of about 1.12x for the Rodinia test set and about 1.4x for the quda application.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.