Deep Neural Networks (DNNs) have emerged as the method of choice for solving a wide range of machine learning tasks. Satiating the enormous growth in computational demand posed by DNNs is a key challenge for computing system designers and has most commonly been addressed through the design of custom accelerators. However, these specialized accelerators that utilize large quantities of multiply-accumulate units and on-chip memory are prohibitive in many design scenarios (e.g., wearable devices and IoT sensors), due to the stringent area/cost constraints. Therefore, accelerating DNNs on these low-power systems, comprising of mainly the indispensable general-purpose processor (GPP) cores, requires new approaches. In this work, we focus on improving the performance of DNNs on GPPs by exploiting a key attribute of DNNs, i.e. sparsity. We propose Sparsity aware Core Extensions (SPARCE) -a set of micro-architectural and ISA extensions that leverage sparsity and are minimally intrusive and low-overhead. We address the key challenges associated with exploiting sparsity in GPPs, viz., dynamically detecting whether an operand (e.g., the result of a load instruction) is zero and subsequently skipping a set of future instructions that use it. To maximize performance benefits, our design ensures that the instructions to be skipped are prevented from even being fetched, as squashing instructions comes with a penalty (e.g., a pipeline stall). SPARCE consists of 2 key micro-architectural enhancements. First, a Sparsity Register File (SpRF) is utilized to track registers that are zero. Next, a Sparsity aware Skip Address (SASA) table is used to indicate instruction sequences that can be skipped, and to associate specific SpRF registers to trigger instruction skipping. When an instruction is fetched, SPARCE dynamically pre-identifies whether the following instruction(s) can be skipped, and if so appropriately modifies the program counter, thereby skipping the redundant instructions and improving performance. We model SPARCE using the gem5 architectural simulator, and evaluate our approach on 6 state-of-the-art image-recognition DNNs in the context of both training and inference using the Caffe deep learning framework. On a scalar microprocessor, SPARCE achieves 1.11×-1.96× speedups across both convolution and fully-connected layers with 10%-90% sparsity, and 19%-31% reduction in execution time at the overall application-level. We also evaluate SPARCE on a 4-way SIMD ARMv8 processor using the OpenBLAS library, and demonstrate that SPARCE achieves 8%-15% reduction in the application-level execution time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.