Today, a large number of applications depend on deep neural networks (DNN) to process data and perform complicated tasks at restricted power and latency specifications. Therefore, processing-in-memory (PIM) platforms are actively explored as a promising approach to improve the throughput and the energy efficiency of DNN computing systems. Several PIM architectures adopt resistive non-volatile memories as their main unit to build crossbar-based accelerators for DNN inference. However, these structures suffer from several drawbacks such as reliability, low accuracy, large ADCs/DACs power consumption and area, high write energy, etc. In this paper, we present a new mixed-signal in-memory architecture based on the bit-decomposition of the multiply and accumulate operations. Our in-memory inference architecture uses a single FeFET as a non-volatile memory cell. Compared to the prior work, this system architecture provides a high level of parallelism while using only 3-bit ADCs. Also, it eliminates the need for any DAC. In addition, we provide flexibility and a very high utilization efficiency even for varying tasks and loads. Simulations demonstrate that we outperform state-of-the-art efficiencies with 36.5 TOPS/W and can pack 2.05 TOPS with 8-bit activation and 4-bit weight precision in an area of 4.9 mm
2
using 22 nm FDSOI technology. Employing binary operation, we obtain 1169 TOPS/W and over 261 TOPS/W/mm
2
on system level.
Increasing complexity and heterogeneity leads to systems that combine the aspects of both digital hardware/software and mixed-signal embedded systems. A major difficulty is the fact that the components for mixed-signal systems are designed bottom-up, while a digital hardware/software system is designed top-down. Often this requires co-simulation, in practice involving multiple simulators from different vendors and on different platforms. Unfortunately, setting up co-simulations is a time-consuming task which is therefore done only a few times for verification purposes. In this paper we show how a plain SystemC simulation can be connected to Saber. A proxy module interfaces to the SystemC simulation and relays signals to Saber. A special signal synchronisation and update scheme ensures the availability of current analogue values to SystemC starting from the very beginning of each time step. Furthermore we introduce a mechanism for automatically connecting SystemC modules and show how it can be used to implement a graphical SystemC editor. A design example which compares a SystemC to Saber cosimulation to a functionally identical SystemC-AMS simulation is also included.
Recent advances in artificial intelligence (AI) have led to successful solutions for numerous applications by utilizing deep neural network (DNN) architectures. [1] Hence, specialized hardware accelerators have been developed to facilitate highspeed computations for these data-intensive workloads. [2] While these computational engines have led to several advanced applications at the cloud-scale, true benefits of AI can be realized by enabling low-power edge computing. For Internet of Things (IoT) devices with constrained area and power, performing highprecision computations becomes infeasible. Quantization of neural network (NN) weights and activations has been explored as a means to reduce the energy cost of computations while preserving computational accuracy. [3] However, memory capacity and bandwidth can be observed as primary limiting factors
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.