A 1036 TOp/s/W, 12.2 mW, 2.72 μJ/Inference All Digital TNN Accelerator in 22 nm FDX Technology for TinyML Applications

Scherer, Moritz; Mauro, Alfio Di; Rutishauser, Georg; Fischer, Tim; Benini, Luca

doi:10.1109/coolchips54332.2022.9772668

Cited by 5 publications

(1 citation statement)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These trade-offs stem from finding the balance between two key considerations: the memory and compute limitations of the target platform and selecting a trained model that achieves a competitive accuracy within acceptable latency [9]. Recently, dedicated Application Specific Integrated Circuits (ASIC) have been designed and validated for ultra-low power edge inference using NN variants [10], [11]. They present custom hardware capable of operating at near-threshold voltages with powergating to reduce active chip power during operations.…”

Section: Introductionmentioning

confidence: 99%

REDRESS: Generating Compressed Models for Edge Inference Using Tsetlin Machines

Maheshwari

Rahman

Member

et al. 2023

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

Inference at-the-edge using embedded machine learning models is associated with challenging trade-offs between resource metrics, such as energy and memory footprint, and the performance metrics, such as computation time and accuracy. In this work, we go beyond the conventional Neural Network based approaches to explore Tsetlin Machine (TM), an emerging machine learning algorithm, that uses learning automata to create propositional logic for classification. We use algorithm-hardware co-design to propose a novel methodology for training and inference of TM. The methodology, called REDRESS, comprises independent TM training and inference techniques to reduce the memory footprint of the resulting automata to target low and ultra-low power applications. The array of Tsetlin Automata (TA) holds learned information in the binary form as bits: {0, 1}, called excludes and includes, respectively. REDRESS proposes a lossless TA compression method, called the include-encoding, that stores only the information associated with includes to achieve over 99% compression. This is enabled by a novel computationally minimal training procedure, called the Tsetlin Automata Re-profiling, to improve the accuracy and increase the sparsity of TA to reduce the number of includes, hence, the memory footprint. Finally, REDRESS includes an inherently bit-parallel inference algorithm that operates on the optimally trained TA in the compressed domain, that does not require decompression during runtime, to obtain high speedups when compared with the state-of-the-art Binary Neural Network (BNN) models. In this work, we demonstrate that using REDRESS approach, TM outperforms BNN models on all design metrics for five benchmark datasets viz. MNIST, CIFAR2, KWS6, Fashion-MNIST and Kuzushiji-MNIST. When implemented on an STM32F746G-DISCO microcontroller, REDRESS obtained speedups and energy savings ranging 5-5700× compared with different BNN models.

show abstract

Section: Introductionmentioning

confidence: 99%