Deep learning (DL) prevalently applies to many tasks that are used to be done using a set of instructions. The improvement of DL capability is desired, which requires the deep neural network (DNN) to be deeper and larger. The wide use of DL with deep and large DNNs causes an immense workload for hardware, which is expected to continue onward. To support this trend, new hardware that accelerates major DL operations with reduced power consumption is strongly demanded. The current mainstream hardware for DL is general-purpose graphics processing units (GPGPUs), which enable parallel multiply-accumulate (MAC) operations-a major operational workload-but consume enormous power. DL accelerators aim to achieve high performance and power efficiency beyond GPGPUs.Several DL accelerators (popularly, referred to as neural processing units) have already been commercialized. [1,2] A nextgeneration (but not far) DL acceleration is considered to be based on the memory-centric architecture that minimizes data movement by computing data near or in memory domains, which is known to consume an immense amount of power. [3] Near-data processing (NDP) realizes parallel floating-point MAC operations in the vicinity of the memory domain, minimizing data movement. [4,5] However, this architecture also suffers from the notorious memory wall issue [6] due to large latency in access to random access memory (RAM). Additionally, NDP leverages its advantages only for memory-bound models, e.g., fully connected networks and recurrent neural networks, where one weight is used for one operation, unlike convolutional neural networks in which one weight is used for many operations.A workaround solution is to merge processing and memory domains into a single domain in which parallel MAC operations are executed in an analog manner. [7][8][9][10][11][12][13][14][15][16][17][18] This strategy realizes inplace MAC operations that allow memory access and operations to occur simultaneously in the same domain. We refer to this strategy as analog computing-in-memory (aCIM). Its feasibility has been demonstrated with various types of memory such as volatile memory, e.g., dynamic RAM, [7] static RAM, [8][9][10] and nonvolatile memory, e.g., magnetic RAM, [11] resistive RAM (RRAM). [12,13,[15][16][17] Among various embedded memories for aCIM, resistance-based nonvolatile RAMs have been attracting large attention because of their feasible analog MAC operations in parallel based on Kirchhoff 's current law (KCL). Particularly, RRAM offers multilevel data representations, so that a single RRAM cell can represent a multi-bit weight, which allows parallel fixed-point MAC operations at reduced space and computational complexities. However, the larger the parallelism and the number of levels of a single cell, the more likely a higher bit-resolution is required for the analog-to-digital converters (ADCs) to avoid data loss, which causes prohibitive power consumption and area overhead.In this regard, it may be necessary to limit the number of levels of a single cell and ...