Silicon-based Static Random Access Memories (SRAM) and digital Boolean logic have been the workhorse of the state-of-the-art computing platforms. Despite tremendous strides in scaling the ubiquitous metal-oxide-semiconductor transistor, the underlying von-Neumann computing architecture has remained unchanged. The limited throughput and energy-efficiency of the state-of-the-art computing systems, to a large extent, results from the well-known von-Neumann bottleneck. The energy and throughput inefficiency of the von-Neumann machines have been accentuated in recent times due to the present emphasis on dataintensive applications like artificial intelligence, machine learning, cryptography etc. A possible approach towards mitigating the overhead associated with the von-Neumann bottleneck is to enable in-memory Boolean computations. In this manuscript, we present an augmented version of the conventional SRAM bitcells, called the X-SRAM, with the ability to perform in-memory, vector Boolean computations, in addition to the usual memory storage operations. We propose at least six different schemes for enabling in-memory vector computations including NAND, NOR, IMP (implication), XOR logic gates with respect to different bitcell topologies − the 8T cell and the 8 + T Differential cell. In addition, we also present a novel 'read-compute-store' scheme, wherein the computed Boolean function can be directly stored in the memory without the need of latching the data and carrying out a subsequent write operation. The feasibility of the proposed schemes have been verified using predictive transistor models and detailed Monte-Carlo variation analysis. As an illustration, we also present the efficacy of the proposed in-memory computations by implementing AES (advanced encryption standard) algorithm on a non-standard von-Neumann machine wherein the conventional SRAM is replaced by X-SRAM. Our simulations indicated that up-to 75% of memory accesses can be saved using the proposed techniques.
Large scale digital computing almost exclusively relies on the von-Neumann architecture which comprises of separate units for storage and computations. The energy expensive transfer of data from the memory units to the computing cores results in the well-known von-Neumann bottleneck. Various approaches aimed towards bypassing the von-Neumann bottleneck are being extensively explored in the literature. These include in-memory computing based on CMOS and beyond CMOS technologies, wherein by making modifications to the memory array, vector computations can be carried out as close to the memory units as possible. Interestingly, in-memory techniques based on CMOS technology are of special importance due to the ubiquitous presence of field-effect transistors and the resultant ease of large scale manufacturing and commercialization. On the other hand, perhaps the most important computation required for applications like machine-learning etc. comprises of the dot product operation. Emerging non-volatile memristive technologies have been shown to be very efficient in computing analog dot products in an in-situ fashion. The memristive analog computation of the dot product results in much faster operation as opposed to digital vector in-memory bit-wise Boolean computations. However, challenges with respect to large scale manufacturing coupled with the limited endurance of memristors have hindered rapid commercialization of memristive based computing solutions. In this work, we show that the standard 8 transistor (8T) digital SRAM array can be configured as an analog-like in-memory multi-bit dot product engine. By applying appropriate analog voltages to the read-ports of the 8T SRAM array, and sensing the output current, an approximate analog-digital dot-product engine can be implemented. We present two different configurations for enabling multi-bit dot product computations in the 8T SRAM cell array, without modifying the standard bit-cell structure. We also demonstrate the robustness of the present proposal in presence of non-idealities like the effect of line-resistances and transistor threshold voltage variations. Since our proposal preserves the standard 8T-SRAM array structure, it can be used as a storage element with standard read-write instructions, and also as an on-demand analog-like dot product accelerator.
Deep neural networks are biologically-inspired class of algorithms that have recently demonstrated state-of-the-art accuracy in large scale classification and recognition tasks. Hardware acceleration of deep networks is of paramount importance to ensure their ubiquitous presence in future computing platforms. Indeed, a major landmark that enables efficient hardware accelerators for deep networks is the recent advances from the machine learning community that have demonstrated the viability of aggressively scaled deep binary networks. In this paper, we demonstrate how deep binary networks can be accelerated in modified von-Neumann machines by enabling binary convolutions within the SRAM array. In general, binary convolutions consist of bit-wise XNOR followed by a populationcount (popcount). We present two proposals − one based on charge sharing approach to perform vector XNORs and approximate popcount and another based on bit-wsie XNORs followed by a digital bit-tree adder for accurate popcount. We highlight the various trade-offs in terms of circuit complexity, speed-up and classification accuracy for both the approaches. Few key techniques presented as a part of the manuscript is use of low-precision, low overhead ADC, to achieve a fairly accurate popcount for the charge-sharing scheme and proposal for sectioning of the SRAM array by adding switches onto the read-bitlines, thereby achieving improved parallelism. Our results on a benchmark image classification dataset CIFAR-10 on a binarized neural network architecture show energy improvements of 6.1× and 2.3× for the two proposals, compared to conventional SRAM banks. In terms of latency, improvements of 15.8× and 8.1× were achieved for the two respective proposals.
In-memory computing' is being widely explored as a novel computing paradigm to mitigate the well known memory bottleneck. This emerging paradigm aims at embedding some aspects of computations inside the memory array, thereby avoiding frequent and expensive movement of data between the compute unit and the storage memory. In-memory computing with respect to Silicon memories has been widely explored on various memory bit-cells. Embedding computation inside the 6 transistor (6T) SRAM array is of special interest since it is the most widely used on-chip memory. In this paper, we present a novel in-memory multiplication followed by accumulation operation capable of performing parallel dot products within 6T SRAM without any changes to the standard bitcell. We, further, study the effect of circuit non-idealities and process variations on the accuracy of the LeNet-5 and VGG neural network architectures against the MNIST and CIFAR-10 datasets, respectively. The proposed in-memory dot-product mechanism achieves 88.8% and 99% accuracy for the CIFAR-10 and MNIST, respectively. Compared to the standard von Neumann system, the proposed system is 6.24× better in energy consumption and 9.42× better in delay.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.