A switched-capacitor matrix multiplier is presented for approximate computing and machine learning applications. The multiply-and-accumulate operations perform discrete-time charge-domain signal processing using passive switches and 300aF unit capacitors. The computation is digitized with a 6b asynchronous SAR. The analyses of incomplete charge accumulation and thermal noise are discussed. The design was fabricated in 40nm CMOS, and experimental measurements of multiplication are illustrated using matched filtering and image convolutions to analyze noise and offset. Two applications are highlighted: 1) energy-efficient feature extraction layer performing both compression and classification in a neural network for an analog front-end and 2) analog acceleration for solving optimization problems that are traditionally performed in the digital domain. The chip obtains measured efficiencies of 8.7TOPS/W at 1GHz for the first application and 7.7TOPS/W at 2.5GHz for the second application. 0.1 Keywords 1) analog computing, 2) approximate computing, 3) neural networks, 4) matched filtering, 5) matrix factorization, 6) switched-capacitor circuits arXiv:1612.00933v1 [cs.ET] 3 Dec 2016 Matrix multiplication is the fundamental operation y = Ax where x ∈ R n maps to output y ∈ R m by a linear system A. It is ubiquitously used in scientific computing, computer graphics, machine learning, real-time signal processing, and optimization. Matrix multiplication in hardware is traditionally realized by multiply-and-accumulate (MAC) units commonly used in general purpose graphics processing units, field programmable gate arrays, and application-specific integrated circuits. Three important parameters in matrix multiplication are computation speed (e.g. throughput), energy efficiency, and resolution. For example, while high computation speed is of utmost importance for scientific computing and graphics, energy efficiency plays a more significant role for embedded systems. On the other hand, high resolution is used to obtain high accuracies in computational simulations [1].There have been recent works in reduced-precision multiplication for statistical inference systems optimized for energy-efficient operation. These applications operate on inherently noisy data and performs tasks such as classification and recognition that are resilient to low signal-to-noise (SNR). These fundamental ideas are the motivating forces for reduced-precision or approximate computing. Such systems include classification systems for images and audio and supervised training in machine learning [2, 3, 4, 5,6,7]. For example, the work of [4] shows that the performance of inference for neural networks is robust at 8b fixed-point. Inference in the context of image recognition entails the prediction result of one image using programmable weights (e.g. elements in the matrix A) that were trained offline. The works of [5,6] show that resolutions for state-ofthe-art networks [8] for the ImageNet Challenge [9] can go down to less than 4b. The ability for these systems...