18.4 A matrix-multiplying ADC implementing a machine-learning classifier directly with data conversion

Zhang, Jintao; Zhuo, Wei; Verma, Naveen

doi:10.1109/isscc.2015.7063061

Cited by 41 publications

(19 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Hence, absolute precision requirements for such systems are rather modest, and mismatches and offset impairments are automatically taken care of by the embedded trained classifier in the loop. As demonstrated by this work, as well as some existing works, machine learning assisted [13,14] and/or digital calibration [15] can improve SNR by 6 -10 dB for comparable power which pushes the efficiency crossover point in the rightward direction as shown in Fig. 2.…”

Section: B Power Efficiency Through Analog Analyticssupporting

confidence: 66%

A 90 nm CMOS, Power-Proportional Acoustic Sensing Frontend for Voice Activity Detection

Badami

Lauwereins

Meert

et al. 2016

IEEE J. Solid-State Circuits

View full text Add to dashboard Cite

This work presents a sub-6 µW acoustic front-end for speech/non-speech classification in a voice activity detection (VAD) in 90 nm CMOS. Power consumption of the VAD system is minimized by architectural design around a new Power-Proportional sensing paradigm and the use of machine-learning assisted moderate-precision analog analytics for classification. Power-Proportional sensing allows for hierarchical and context-aware scaling of the frontend's power consumption depending on the complexity of the ongoing information extraction, while the use of analog analytics brings increased power efficiency through switching on/off the computation of individual features depending on the features' usefulness in a particular context. The proposed VAD system reduces the power consumption by 10X as compared to state-of-the-art systems and yet achieves an 89% average hit rate for a 12 dB signal to acoustic noise ratio in babble context, which is at par with software based VAD systems.

show abstract

Section: B Power Efficiency Through Analog Analyticssupporting

confidence: 66%

A 90 nm CMOS, Power-Proportional Acoustic Sensing Frontend for Voice Activity Detection

Badami

Lauwereins

Meert

et al. 2016

IEEE J. Solid-State Circuits

View full text Add to dashboard Cite

show abstract

“…It was shown in [69] that performing the MAC using switched capacitors can be more energy-efficient than digital circuits despite ADC and DAC conversion overhead. Accordingly, the matrix multiplication can be integrated into the ADC as demonstrated in [70], where the most significant bits of the multiplications for Adaboost classification are performed using switched capacitors in an 8-bit successive approximation format. This is extended in [71] to not only perform multiplications, but also the accumulation in the analog domain.…”

Section: Opportunities In Mixed-signal Circuitsmentioning

confidence: 99%

Hardware for machine learning: Challenges and opportunities

Sze

Chen

Einer

et al. 2017

2017 IEEE Custom Integrated Circuits Conference (CICC)

201

111

View full text Add to dashboard Cite

Abstract-Machine learning plays a critical role in extracting meaningful information out of the zetabytes of sensor data collected every day. For some applications, the goal is to analyze and understand the data to identify trends (e.g., surveillance, portable/wearable electronics); in other applications, the goal is to take immediate action based the data (e.g., robotics/drones, self-driving cars, smart Internet of Things). For many of these applications, local embedded processing near the sensor is preferred over the cloud due to privacy or latency concerns, or limitations in the communication bandwidth. However, at the sensor there are often stringent constraints on energy consumption and cost in addition to throughput and accuracy requirements. Furthermore, flexibility is often required such that the processing can be adapted for different applications or environments (e.g., update the weights and model in the classifier). In many applications, machine learning often involves transforming the input data into a higher dimensional space, which, along with programmable weights, increases data movement and consequently energy consumption. In this paper, we will discuss how these challenges can be addressed at various levels of hardware design ranging from architecture, hardware-friendly algorithms, mixed-signal circuits, and advanced technologies (including memories and sensors).

show abstract

“…Unlike [28], the accumulate operation is also performed in the analog charge domain, which fundamentally reduces the number of A/D conversions and rate by 64× for every set of 64 multiply-andaccumulates. Figure 4 shows the chip boundary during testing and the complete compute-memory engine.…”

Section: Circuit Implementationmentioning

confidence: 99%

“…Analog acceleration using the SCMM at 2.5GHz performs slightly worse than digital double-precision 64b and is equivalent to simulated digital fixed-point at an estimated 6× lower energy and the compute to memory read energy ratio is 1.05:1. Table 2 summarizes the performance of the analog charge-domain MAC for two applications as compared with a recent work of embedding multiplication in a SAR ADC [28]. The efficiencies are computed based on measured power and speed.…”

Section: Chip Measurements and Applicationsmentioning

confidence: 99%

Analysis and Design of a Passive Switched-Capacitor Matrix Multiplier for Approximate Computing

Lee

Wong

2017

IEEE J. Solid-State Circuits

View full text Add to dashboard Cite

A switched-capacitor matrix multiplier is presented for approximate computing and machine learning applications. The multiply-and-accumulate operations perform discrete-time charge-domain signal processing using passive switches and 300aF unit capacitors. The computation is digitized with a 6b asynchronous SAR. The analyses of incomplete charge accumulation and thermal noise are discussed. The design was fabricated in 40nm CMOS, and experimental measurements of multiplication are illustrated using matched filtering and image convolutions to analyze noise and offset. Two applications are highlighted: 1) energy-efficient feature extraction layer performing both compression and classification in a neural network for an analog front-end and 2) analog acceleration for solving optimization problems that are traditionally performed in the digital domain. The chip obtains measured efficiencies of 8.7TOPS/W at 1GHz for the first application and 7.7TOPS/W at 2.5GHz for the second application. 0.1 Keywords 1) analog computing, 2) approximate computing, 3) neural networks, 4) matched filtering, 5) matrix factorization, 6) switched-capacitor circuits arXiv:1612.00933v1 [cs.ET] 3 Dec 2016 Matrix multiplication is the fundamental operation y = Ax where x ∈ R n maps to output y ∈ R m by a linear system A. It is ubiquitously used in scientific computing, computer graphics, machine learning, real-time signal processing, and optimization. Matrix multiplication in hardware is traditionally realized by multiply-and-accumulate (MAC) units commonly used in general purpose graphics processing units, field programmable gate arrays, and application-specific integrated circuits. Three important parameters in matrix multiplication are computation speed (e.g. throughput), energy efficiency, and resolution. For example, while high computation speed is of utmost importance for scientific computing and graphics, energy efficiency plays a more significant role for embedded systems. On the other hand, high resolution is used to obtain high accuracies in computational simulations [1].There have been recent works in reduced-precision multiplication for statistical inference systems optimized for energy-efficient operation. These applications operate on inherently noisy data and performs tasks such as classification and recognition that are resilient to low signal-to-noise (SNR). These fundamental ideas are the motivating forces for reduced-precision or approximate computing. Such systems include classification systems for images and audio and supervised training in machine learning [2, 3, 4, 5,6,7]. For example, the work of [4] shows that the performance of inference for neural networks is robust at 8b fixed-point. Inference in the context of image recognition entails the prediction result of one image using programmable weights (e.g. elements in the matrix A) that were trained offline. The works of [5,6] show that resolutions for state-ofthe-art networks [8] for the ImageNet Challenge [9] can go down to less than 4b. The ability for these systems...

show abstract

18.4 A matrix-multiplying ADC implementing a machine-learning classifier directly with data conversion

Cited by 41 publications

References 2 publications

A 90 nm CMOS, Power-Proportional Acoustic Sensing Frontend for Voice Activity Detection

A 90 nm CMOS, Power-Proportional Acoustic Sensing Frontend for Voice Activity Detection

Hardware for machine learning: Challenges and opportunities

Analysis and Design of a Passive Switched-Capacitor Matrix Multiplier for Approximate Computing

Contact Info

Product

Resources

About