We introduce ''aCortex,'' an extremely energy-efficient, fast, compact, and versatile neuromorphic processor architecture suitable for the acceleration of a wide range of neural network inference models. The most important feature of our processor is a configurable mixed-signal computing array of vector-by-matrix multiplier (VMM) blocks utilizing embedded nonvolatile memory arrays for storing weight matrices. Analog peripheral circuitry for data conversion and high-voltage programming are shared among a large array of VMM blocks to facilitate compact and energy-efficient analog-domain VMM operation of different types of neural network layers. Other unique features of aCortex include configurable chain of buffers and data buses, simple and efficient instruction set architecture and its corresponding multiagent controller, programmable quantization range, and a customized refresh-free embedded dynamic random access memory. The energy-optimal aCortex with 4-bit analog computing precision was designed in a 55-nm process with embedded NOR flash memory. Its physical performance was evaluated using experimental data from testing individual circuit elements and physical layout of key components for several common benchmarks, namely, Inception-v1 and ResNet-152, two state-of-the-art deep feedforward networks for image classification, and GNTM, Google's deep recurrent network for language translation. The system-level simulation results for these benchmarks show the energy efficiency of 97, 106, and 336 TOp/J, respectively, combined with up to 15 TOp/s computing throughput and 0.27-MB/mm 2 storage efficiency. Such estimated performance results compare favorably with those of previously reported mixed-signal accelerators based on much less mature aggressively scaled resistive switching memories.