Multiple threads running on a multi-core processor can improve the performance of a parallel application significantly. However, effective scaling of threads and cores plays a key role to achieve optimal performance because performance does not necessarily improve with increasing number of cores. Multi-threaded applications suffer due to thread synchronization, negative interference in shared memory including last level cache and main memory. Memory bandwidth also often limits the performance of a multi-threaded workload. In this paper we propose a method to achieve optimal scalability on multi-core platform and predict the bandwidth requirement of parallel workloads for a given number of threads. We employ the proposed method to improve the performance of bandwidth limited parallel applications. We find that DRAM access has various phases and use the highest bandwidth among all phases to predict the performance of a given workload on multi-threaded environment. We evaluate our proposed method using Gem5 multi-core simulator and the experimental results show that the phase based bandwidth utilization method can estimate the optimal number of threads for a given parallel workload and has low prediction error.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.