“…TCUs come under the guise of different marketing terms, be it NVIDIA's Tensor Cores [18], Google's Tensor Processing Unit [10], Intel KNL's AVX extensions [76], Apple A11's Neural Engine [2], or ARM's Machine Learning Processor [3]. TCUs are designed to accelerate Multilayer Perceptrons (MLP), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), or Deep Neural Network (DNN) in general TCUs vary in implementation [18,36,40,43,48,54,71,74,75,76,79,87], and are prevalent [1,4,8,9,10,11,24,70] in edge devices, mobile, and the cloud.…”