Abstract-The cost of data movement has always been an important concern in high performance computing (HPC) systems. It has now become the dominant factor in terms of both energy consumption and performance. Support for expression of data locality has been explored in the past, but those efforts have had only modest success in being adopted in HPC applications for various reasons. However, with the increasing complexity of the memory hierarchy and higher parallelism in emerging HPC systems, locality management has acquired a new urgency. Developers can no longer limit themselves to low-level solutions and ignore the potential for productivity and performance portability obtained by using locality abstractions. Fortunately, the trend emerging in recent literature on the topic alleviates many of the concerns that got in the way of their adoption by application developers. Data locality abstractions are available in the forms of libraries, data structures, languages and runtime systems; a common theme is increasing productivity without sacrificing performance. This paper examines these trends and identifies commonalities that can combine various locality concepts to develop a comprehensive approach to expressing and managing data locality on future large-scale high-performance computing systems.
Our ability to create systems with large amount of hardware parallelism is exceeding the average software developer's ability to effectively program them. This is a problem that plagues our industry. Since the vast majority of the world's software developers are not parallel programming experts, making it easy to write, port, and debug applications with sufficient core and vector parallelism is essential to enabling the use of multi-and many-core processor architectures. However, hardware architectures and vector ISAs are also shifting and diversifying quickly, making it difficult for a single binary to run well on all possible targets. Because of this, retargetability and dynamic compilation are of growing relevance. This paper introduces Intel® Array Building Blocks (ArBB), which is a retargetable dynamic compilation framework. This system focuses on making it easier to write and port programs so that they can harvest data and thread parallelism on both multi-core and heterogeneous many-core architectures, while staying within standard C++. ArBB interoperates with other programming models to help meet the demands we hear from customers for a solution with both greater programmer productivity and good performance.This work makes contributions in language features, compiler architecture, code transformations and optimizations. It presents performance data from the current beta release of ArBB and quantitatively shows the impact of some key analyses, enabling transformations and optimizations for a variety of benchmarks that are of interest to our customers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.