Chris J. Newburn scite author profile

Abstract-The cost of data movement has always been an important concern in high performance computing (HPC) systems. It has now become the dominant factor in terms of both energy consumption and performance. Support for expression of data locality has been explored in the past, but those efforts have had only modest success in being adopted in HPC applications for various reasons. However, with the increasing complexity of the memory hierarchy and higher parallelism in emerging HPC systems, locality management has acquired a new urgency. Developers can no longer limit themselves to low-level solutions and ignore the potential for productivity and performance portability obtained by using locality abstractions. Fortunately, the trend emerging in recent literature on the topic alleviates many of the concerns that got in the way of their adoption by application developers. Data locality abstractions are available in the forms of libraries, data structures, languages and runtime systems; a common theme is increasing productivity without sacrificing performance. This paper examines these trends and identifies commonalities that can combine various locality concepts to develop a comprehensive approach to expressing and managing data locality on future large-scale high-performance computing systems.

show abstract

Using interaction costs for microarchitectural bottleneck analysis

Fields¹,

Bodík²,

Hill³

et al.

View full text Add to dashboard Cite

Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language

Newburn

Liu

et al. 2011

View full text Add to dashboard Cite

Our ability to create systems with large amount of hardware parallelism is exceeding the average software developer's ability to effectively program them. This is a problem that plagues our industry. Since the vast majority of the world's software developers are not parallel programming experts, making it easy to write, port, and debug applications with sufficient core and vector parallelism is essential to enabling the use of multi-and many-core processor architectures. However, hardware architectures and vector ISAs are also shifting and diversifying quickly, making it difficult for a single binary to run well on all possible targets. Because of this, retargetability and dynamic compilation are of growing relevance. This paper introduces Intel® Array Building Blocks (ArBB), which is a retargetable dynamic compilation framework. This system focuses on making it easier to write and port programs so that they can harvest data and thread parallelism on both multi-core and heterogeneous many-core architectures, while staying within standard C++. ArBB interoperates with other programming models to help meet the demands we hear from customers for a solution with both greater programmer productivity and good performance.This work makes contributions in language features, compiler architecture, code transformations and optimizations. It presents performance data from the current beta release of ArBB and quantitatively shows the impact of some key analyses, enabling transformations and optimizations for a variety of benchmarks that are of interest to our customers.

show abstract

Offload Compiler Runtime for the Intel® Xeon Phi Coprocessor

Newburn

Dmitriev

Narayanaswamy

et al. 2013

View full text Add to dashboard Cite

Heterogeneous Streaming

Newburn

Bansal

Wood³

et al. 2016

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Chris J. Newburn

Trends in Data Locality Abstractions for HPC Systems

Using interaction costs for microarchitectural bottleneck analysis

Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language

Offload Compiler Runtime for the Intel® Xeon Phi Coprocessor

Heterogeneous Streaming

Contact Info

Product

Resources

About

Chris J. Newburn

Trends in Data Locality Abstractions for HPC Systems

Using interaction costs for microarchitectural bottleneck analysis

Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language

Offload Compiler Runtime for the Intel&#x00AE; Xeon Phi Coprocessor

Heterogeneous Streaming

Contact Info

Product

Resources

About

Offload Compiler Runtime for the Intel® Xeon Phi Coprocessor