Numerous neural network circuits and architectures are presently under active research for application to artificial intelligence and machine learning. Their physical performance metrics (area, time, energy) are estimated. Various types of neural networks (artificial, cellular, spiking, and oscillator) are implemented with multiple CMOS and beyond-CMOS (spintronic, ferroelectric, resistive memory) devices. A consistent and transparent methodology is proposed and used to benchmark this comprehensive set of options across several application cases. Promising architecture/device combinations are identified.In the last few years, AI/ML achieved prominent successes, especially related to deep neural networks (DNN) [7] and convolutional neural networks (CoNN). ML has enabled a revolutionary improvement in the accuracy of image, pattern, and facial recognition, including 2 the treatment of 'big data' online. More demand for neural computing is emerging in robotic control, autonomous vehicles, drones, etc.One of the main concerns on the minds of developers of AI computing systems is the same as for traditional computing: the power consumption in the chips. The history of traditional computing shows that the commercial success of computing devices and architectures is predicated largely on their physical performanceareal density, speed of operation, and consumed energy as benchmarked in [8,9]. These ultimately translate into processing throughput and consumed power of the chips, which are of utmost importance to the user. A fair comparison between the published neural network implementations is difficult due to the difference in the process technology generation, the network architectures, and computing workloads.The main purpose of the paper is to establish a methodology for comparing various neural network hardware approaches and to understand the trends revealed through its development. In doing that we strive to adhere to the following principles: a) general: wide scope of technologies, devices and circuits; b) transparent: simple analytics more important than precise simulations; c) uniform: consistent inputs and assumptions across multiple types of hardware; d) transparent: all models used are described and the code is available [10] to the reader for verification.Let us differentiate this work from the existing body of literature. We do not aim to give a literature review, and refer the reader to the excellent review papers in the neuromorphic hardware field [11,12] which do not attempt to quantitatively compare prior works, as we do in this paper. Oftentimes benchmarking refers to comparing various algorithms for an application mainly based on their accuracy and with little reference to hardware implementation, e.g. [13]. In contrast, we compare different types of hardware implementing the same algorithm and focusing on its energy consumption and performance.The discussion of neural networks in the numerous papers currently being published remains at the architecture level. For example, the accuracy of recognition...