Stuart Biles scite author profile

In this paper we describe the ARM Scalable Vector Extension (SVE). Several goals guided the design of the architecture. First was the need to extend the vector processing capability associated with the ARM AArch64 execution state to better address the compute requirements in domains such as high performance computing (HPC), data analytics, computer vision and machine learning. Second was the desire to introduce an extension that can scale across multiple implementations, both now and into the future, allowing CPU designers to choose the vector length most suitable for their power, performance and area targets. Finally, the architecture should avoid imposing a software development cost as the vector length changes and where possible reduce it by improving the reach of compiler auto-vectorization technologies.We believe SVE achieves these goals. It allows implementations to choose a vector register length between 128 and 2048 bits. It supports a vector length agnostic programming model which allows code to run and scale automatically across all vector lengths without recompilation. Finally, it introduces several innovative features that begin to overcome some of the traditional barriers to autovectorization.

show abstract

An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors

Clark

Blome

Chu

et al.

View full text Add to dashboard Cite

Instruction set customization is an effective way to improve processor performance. Critical portions of application dataflow graphs are collapsed for accelerated execution on specialized hardware. Collapsing dataflow subgraphs will compress the latency along critical paths and reduces the number of intermediate results stored in the register file. While custom instructions can be effective, the time and cost of designing a new processor for each application is immense. To overcome this roadblock, this paper proposes a flexible architectural framework to transparently integrate custom instructions into a general-purpose processor. Hardware accelerators are added to the processor to execute the collapsed subgraphs. A simple microarchitectural interface is provided to support a plug-and-play model for integrating a wide range of accelerators into a pre-designed and verified processor core. The accelerators are exploited using an approach of static identification and dynamic realization. The compiler is responsible for identifying profitable subgraphs, while the hardware handles discovery, mapping, and execution of compatible subgraphs. This paper presents the design of a plug-and-play transparent accelerator system and evaluates the cost/performance implications of the design.

show abstract

An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors

Clark

Blome

Chu

et al. 2005

SIGARCH Comput. Archit. News

View full text Add to dashboard Cite

show abstract

Efficient System-on-Chip Energy Management with a Segmented Bloom Filter

Ghosh¹,

Özer²,

Biles³

et al. 2006

View full text Add to dashboard Cite

Reducing energy of virtual cache synonym lookup using bloom filters

Woo

Ghosh

Özer

et al. 2006

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Stuart Biles

The ARM Scalable Vector Extension

An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors

An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors

Efficient System-on-Chip Energy Management with a Segmented Bloom Filter

Reducing energy of virtual cache synonym lookup using bloom filters

Contact Info

Product

Resources

About