As the size of Deep Neural Networks (DNNs) continues to grow to increase accuracy and solve more complex problems, their energy footprint also scales. Weight pruning reduces DNN model size and the computation by removing redundant weights. However, we implemented weight pruning for several popular networks on a variety of hardware platforms and observed surprising results. For many networks, the network sparsity caused by weight pruning will actually hurt the overall performance despite large reductions in the model size and required multiply-accumulate operations. Also, encoding the sparse format of pruned networks incurs additional storage space overhead. To overcome these challenges, we propose Scalpel that customizes DNN pruning to the underlying hardware by matching the pruned network structure to the data-parallel hardware organization. Scalpel consists of two techniques: SIMD-aware weight pruning and node pruning. For low-parallelism hardware (e.g., microcontroller), SIMD-aware weight pruning maintains weights in aligned fixed-size groups to fully utilize the SIMD units. For high-parallelism hardware (e.g., GPU), node pruning removes redundant nodes, not redundant weights, thereby reducing computation without sacrificing the dense matrix format. For hardware with moderate parallelism (e.g., desktop CPU), SIMD-aware weight pruning and node pruning are synergistically applied together. Across the microcontroller, CPU and GPU, Scalpel achieves mean speedups of 3.54x, 2.61x, and 1.25x while reducing the model sizes by 88%, 82%, and 53%. In comparison, traditional weight pruning achieves mean speedups of 1.90x, 1.06x, 0.41x across the three platforms.
Abstract-With shrinking transistor sizes and supply voltages, errors in combinational logic due to radiation particle strikes are on the rise. A broad range of applications will soon require protection from this type of error, requiring an effective and inexpensive solution. Many previously proposed logic protection techniques rely on duplicate logic or latches, incurring high overheads. In this paper, we present a technique for transient error detection using parity trees for power and area efficiency. This approach is highly customizable, allowing adjustment of a number of parameters for optimal error coverage and overhead. We present simulation results comparing our scheme to latch duplication, showing on average greater than 55% savings in area and power overhead for the same error coverage. We also demonstrate adding protection to reach a target logic soft error rate, constituting at best a 59X reduction in the error rate with under 2% power and area overhead.
Dynamic voltage and frequency scaling can provide substantial energy savings but is limited by SRAM since some cells will fail at very low voltages. Due to process variation effects, a small subset of SRAM cells will be more sensitive to voltage reduction, requiring increased margins and limiting energy savings. Since large arrays like caches are most vulnerable to cell failures, recent proposals suggest disabling failing portions of the cache to enable low voltage operation. Although such approaches save power, energy reduction is limited because reducing the effective cache size increases program runtimes. In this paper, we present iPatch, a solution to regain this lost performance and enable energy savings by exploiting the redundancy inherent in superscalar processors. By relying on existing microarchitectural structures and mechanisms to "patch" the faulty parts of caches, we enable further energy reduction with minimal overhead and complexity. Furthermore, because no critical paths or circuits are affected by our implementation, there is no impact on normalvoltage operation. For high cell failure rates, our results show significant energy savings with iPatch as well as an 18% reduction in energy-delay product compared to prior work. 428978-1-4799-8930-0/15/$31.00 ©2015 IEEE
With the advent of general-purpose GPU computing, it is becoming increasingly desirable to protect GPUs from soft errors. For high computation throughout, GPUs must store a significant amount of state and have many execution units. The high power and area costs of full protection from soft errors make selective protection techniques attractive. Such approaches provide maximum error coverage within a fixed area or power limit, but typically treat all errors equally. We observe that for many floating-point-intensive GPGPU applications, small magnitude errors may have little effect on results, while large magnitude errors can be amplified to have a significant negative impact. We therefore propose a novel precision-aware protection approach for the GPU execution logic and register file to mitigate large magnitude errors. We also propose an architecture modification to optimize error coverage for integer computations. Our approach combines selective logic hardening, targeted checker circuits, and intelligent register file encoding for best error protection. We demonstrate that our approach can reduce the mean error magnitude by up to 87% compared to a traditional selective protection approach with the same overhead.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.