In this paper, we present a digital processing in memory (DPIM) configured as a stride edgedetection search frequency neural network (SE-SFNN) which is trained through a spike-location-dependent -plasticity (SLDP), a learning mechanism reminiscent of spike-timing-dependent plasticity. This mechanism allows for rapid online learning as well as a simple memory-based implementation. In particular, we employ a ternary data scheme to take advantage of a ternary content addressable memory (TCAM). The scheme utilizes a ternary representation of the image pixels and the TCAMs are used in a two-layer format to significantly reduce the computation time. The first layer applies several filtering kernels followed by the second layer that reorders pattern dictionaries of TCAMs to place the most frequent patterns at the top of each supervised TCAM dictionary. Numerous TCAM blocks in both layers operate in a massively parallel fashion using digital ternary values. There are no complicated multiply operations performed and learning is performed in a feedforward scheme. This allows rapid robust learning as a trade-off with the parallel memory block size. Furthermore, we propose a method to reduce the TCAM memory size using a two-tiered minor to major promotion (M2MP) of frequently occurring patterns. This reduction scheme is performed concurrently during the learning operation without incurring a preconditioning overhead. We show that with a minimal circuit overhead, the required memory size is reduced by 84.4% and the total clock cycles required for learning also decrease by 97.31 % while the accuracy decreases only by 1.12%. We classified images with 94.58% accuracy on the MNIST dataset. Using a 100MHz clock, our simulation results show that the MNIST training takes about 6.3 ms dissipating less than 4mW of average power. In terms of the inference speed, the trained hardware is capable of processing 5,882,352 images per second.
INDEX TERMS Digital processing in memory, fast training, in memory computation, TCAM
I. INTRODUCTIONRecently, various products using artificial intelligence algorithms have been released. Examples include home appliances such as refrigerators and air conditioners, automobiles for autonomous driving, and smart factories. These products collected data from edge devices and then sent the collected data to the server, where learning and computation were performed on the server [1]. However, in this case, there is a delay in the transmitting to the server and receiving the neural network weights. In addition, userspecific learning cannot be performed because it is not trained using only the data generated from the user's edge device. Due to these shortcomings, Artificial intelligence on things (AIoT) technology that can learn from edge devices was recently introduced [2].Unlike the existing convolutional neural networks (CNNs) and deep neural networks (DNNs) methods in which heavy amounts of iterative computation between tensors are