Harry Wagstaff scite author profile

System designers typically use well-studied benchmarks to evaluate and improve new architectures and compilers. We design tomorrow's systems based on yesterday's applications. In this paper we investigate an emerging application, 3D scene understanding, likely to be signi cant in the mobile space in the near future. Until now, this application could only run in real-time on desktop GPUs. In this work, we examine how it can be mapped to power constrained embedded systems. Key to our approach is the idea of incremental co-design exploration, where optimization choices that concern the domain layer are incrementally explored together with low-level compiler and architecture choices. The goal of this exploration is to reduce execution time while minimizing power and meeting our quality of result objective. As the design space is too large to exhaustively evaluate, we use active learning based on a random forest predictor to nd good designs. We show that our approach can, for the rst time, achieve dense 3D mapping and tracking in the real-time range within a 1W power budget on a popular embedded device. This is a 4.8x execution time improvement and a 2.8x power reduction compared to the state-of-the-art

show abstract

SLAMBench2: Multi-Objective Head-to-Head Benchmarking for Visual SLAM

Bodin

Wagstaff

Saecdi

et al. 2018

View full text Add to dashboard Cite

SLAM is becoming a key component of robotics and augmented reality (AR) systems. While a large number of SLAM algorithms have been presented, there has been little effort to unify the interface of such algorithms, or to perform a holistic comparison of their capabilities. This is a problem since different SLAM applications can have different functional and non-functional requirements. For example, a mobile phonebased AR application has a tight energy budget, while a UAV navigation system usually requires high accuracy. SLAMBench2 is a benchmarking framework to evaluate existing and future SLAM systems, both open and close source, over an extensible list of datasets, while using a comparable and clearly specified list of performance metrics. A wide variety of existing SLAM algorithms and datasets is supported, e.g. ElasticFusion, InfiniTAM, ORB-SLAM2, OKVIS, and integrating new ones is straightforward and clearly specified by the framework. SLAMBench2 is a publicly-available software framework which represents a starting point for quantitative, comparable and validatable experimental research to investigate trade-offs across SLAM systems.

show abstract

Navigating the Landscape for Real-Time Localization and Mapping for Robotics and Virtual and Augmented Reality

et al. 2018

View full text Add to dashboard Cite

Visual understanding of 3D environments in realtime, at low power, is a huge computational challenge. Often referred to as SLAM (Simultaneous Localisation and Mapping), it is central to applications spanning domestic and industrial robotics, autonomous vehicles, virtual and augmented reality. This paper describes the results of a major research effort to assemble the algorithms, architectures, tools, and systems software needed to enable delivery of SLAM, by supporting applications specialists in selecting and configuring the appropriate algorithm and the appropriate hardware, and compilation pathway, to meet their performance, accuracy, and energy consumption goals. The major contributions we present are (1) tools and methodology for systematic quantitative evaluation of SLAM algorithms, (2) automated, machine-learning-guided exploration of the algorithmic and implementation design space with respect to multiple objectives, (3) end-to-end simulation tools to enable optimisation of heterogeneous, accelerated architectures for the specific algorithmic requirements of the various SLAM algorithmic approaches, and (4) tools for delivering, where appropriate, accelerated, adaptive SLAM solutions in a managed, JIT-compiled, adaptive runtime context. Performance EvaluationRuntime Architecture Compiler and Algorithm Design Space Exploration -Machine Learning Fig. 1: The objective of the paper is to create a pipeline that aligns computer vision requirements with hardware capabilities. The paper's focus is on three layers: algorithms, compiler and runtime, and architecture. The goal is to develop a system that allows us to achieve power and energy efficiency, speed and runtime improvement, and accuracy/robustness at each layer and also holistically through design space exploration and machine learning techniques.

show abstract

Early partial evaluation in a JIT-compiled, retargetable instruction set simulator generated from a high-level architecture description

Wagstaff

Gould

Franke

et al. 2013

View full text Add to dashboard Cite

Modern processor design tools integrate in their workflows generators for instruction set simulators (Iss) from architecture descriptions. Whilst these generated simulators are useful for design evaluation and software development, they suffer from poor performance. We present an ultra-fast Jitcompiled Iss generated from an ArchC description. We also introduce a novel partial evaluation optimisation, which further improves Jit compilation time and code quality. This results in a simulation rate of 510Mips for an Arm target across 45 Eembc and Spec benchmarks. On average, our Iss is 1.7 times faster than Simit-Arm, one of the fastest Iss generated from an architecture description.

show abstract

Efficient code generation in a region-based dynamic binary translator

Spink

Wagstaff

Franke

et al. 2014

View full text Add to dashboard Cite

Region-based JIT compilation operates on translation units comprising multiple basic blocks and, possibly cyclic or conditional, control flow between these. It promises to reconcile aggressive code optimisation and low compilation latency in performancecritical dynamic binary translators. Whilst various region selection schemes and isolated code optimisation techniques have been investigated it remains unclear how to best exploit such regions for efficient code generation. Complex interactions with indirect branch tables and translation caches can have adverse effects on performance if not considered carefully. In this paper we present a complete code generation strategy for a region-based dynamic binary translator, which exploits branch type and control flow profiling information to improve code quality for the common case. We demonstrate that using our code generation strategy a competitive region-based dynamic compiler can be built on top of the LLVM JIT compilation framework. For the ARM V5T target ISA and SPEC CPU 2006 benchmarks we achieve execution rates of, on average, 867 MIPS and up to 1323 MIPS on a standard X86 host machine, outperforming state-of-the-art QEMU-ARM by delivering a speedup of 264%.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Harry Wagstaff

Integrating Algorithmic Parameters into Benchmarking and Design Space Exploration in 3D Scene Understanding

SLAMBench2: Multi-Objective Head-to-Head Benchmarking for Visual SLAM

Navigating the Landscape for Real-Time Localization and Mapping for Robotics and Virtual and Augmented Reality

Early partial evaluation in a JIT-compiled, retargetable instruction set simulator generated from a high-level architecture description

Efficient code generation in a region-based dynamic binary translator

Contact Info

Product

Resources

About