Proteus

Judd, Patrick; Albericio, Jorge; Hetherington, Tayler; Aamodt, Tor M.; Jerger, Natalie Enright; Moshovos, Andreas

doi:10.1145/2925426.2926294

Cited by 56 publications

(3 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Even though VS shares the benefits from recent advances in ISP [6,9,14,23,31,33,35,40,67,75,76,82,85,89,96] and neardata processing [2,17,20,39,48,50,63,80,84], these frameworks need the mechanisms that VS offers in order to execute approximate computing applications efficiently. And while using approximate computing in channel encoding [38,62] and memory controller [30] can achieve an effect similar to that of VS in terms of reducing data-movement overhead, VS is independent of these projects and requires no changes in hardware.…”

Section: Other Related Workmentioning

confidence: 99%

Dynamic Multi-Resolution Data Storage

Lokhandwala

et al. 2019

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture

View full text Add to dashboard Cite

Approximate computing that works on less precise data leads to significant performance gains and energy-cost reductions for compute kernels. However, without leveraging the full-stack design of computer systems, modern computer architectures undermine the potential of approximate computing. In this paper, we present Varifocal Storage, a dynamic multiresolution storage system that tackles challenges in performance, quality, flexibility and cost for computer systems supporting diverse application demands. Varifocal Storage dynamically adjusts the dataset resolution within a storage device, thereby mitigating the performance bottleneck of exchanging/preparing data for approximate compute kernels. Varifocal Storage introduces Autofocus and iFilter mechanisms to provide quality control inside the storage device and make programs more adaptive to diverse datasets. Varifocal Storage also offers flexible, efficient support for approximate and exact computing without exceeding the costs of conventional storage systems by (1) saving the raw dataset in the storage device, and (2) targeting operators that complement the power of existing SSD controllers to dynamically generate lower-resolution datasets. We evaluate the performance of Varifocal Storage by running applications on a heterogeneous computer with our prototype SSD. The results show that Varifocal Storage can speed up data resolution adjustments by 2.02× or 1.74× without programmer input. Compared to conventional approximate-computing architectures, Varifocal Storage speeds up the overall execution time by 1.52×.

show abstract

Section: Other Related Workmentioning

confidence: 99%

Dynamic Multi-Resolution Data Storage

Lokhandwala

et al. 2019

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture

View full text Add to dashboard Cite

show abstract

“…[1] proposes a CNN accelerator design that can skip computations on input values that are zeros. [14,21] reduce an accelerator's bandwidth and buffer use. [21] uses per-layer data quantization and matrix-decomposition, whereas [14] uses perlayer numerical precision reduction.…”

Section: Related Workmentioning

confidence: 99%

“…[14,21] reduce an accelerator's bandwidth and buffer use. [21] uses per-layer data quantization and matrix-decomposition, whereas [14] uses perlayer numerical precision reduction. [2] uses a fused-layer technique to reduce bandwidth use of convolutional layers.…”

Section: Related Workmentioning

confidence: 99%

Maximizing CNN Accelerator Efficiency Through Resource Partitioning

Shen

Ferdman

Milder

2017

Proceedings of the 44th Annual International Symposium on Computer Architecture

203

View full text Add to dashboard Cite

Convolutional neural networks (CNNs) are revolutionizing machine learning, but they present significant computational challenges. Recently, many FPGA-based accelerators have been proposed to improve the performance and efficiency of CNNs. Current approaches construct a single processor that computes the CNN layers one at a time; the processor is optimized to maximize the throughput at which the collection of layers is computed. However, this approach leads to inefficient designs because the same processor structure is used to compute CNN layers of radically varying dimensions.We present a new CNN accelerator paradigm and an accompanying automated design methodology that partitions the available FPGA resources into multiple processors, each of which is tailored for a different subset of the CNN convolutional layers. Using the same FPGA resources as a single large processor, multiple smaller specialized processors increase computational efficiency and lead to a higher overall throughput. Our design methodology achieves 3.8x higher throughput than the state-of-the-art approach on evaluating the popular AlexNet CNN on a Xilinx Virtex-7 FPGA. For the more recent SqueezeNet and GoogLeNet, the speedups are 2.2x and 2.0x. ] for (r =0; r

show abstract

Posits and the state of numerical representations in the age of exascale and edge computing

2021

View full text Add to dashboard Cite

Growing constraints on memory utilization, power consumption, and I/O throughput have increasingly become limiting factors to the advancement of high performance computing (HPC) and edge computing applications. IEEE-754 floating-point types have been the de facto standard for floating-point number systems for decades, but the drawbacks of this numerical representation leave much to be desired. Alternative representations are gaining traction, both in HPC and machine learning environments. Posits have recently been proposed as a drop-in replacement for the IEEE-754 floating-point representation. We survey the state-of-the-art and state-of-the-practice in the development and use of posits in edge computing and HPC. The current literature supports posits as a promising alternative to traditional floating-point systems, both as a stand-alone replacement and in a mixed-precision environment. Development and standardization of the posit type is ongoing, and much research remains to explore the application of posits in different domains, how to best implement them in hardware, and where they fit with other numerical representations.

show abstract

Proteus

Cited by 56 publications

References 26 publications

Dynamic Multi-Resolution Data Storage

Dynamic Multi-Resolution Data Storage

Maximizing CNN Accelerator Efficiency Through Resource Partitioning

Posits and the state of numerical representations in the age of exascale and edge computing

Contact Info

Product

Resources

About