“…Another powerful family of techniques recently drawing increasing interest across the machine learning, computer vision and speech technology communities to solve this problem is to use low-bit DNN quantization techniques [31]- [37], [52], [57], [58], [62], [74], [75]. By replacing floating point based DNN parameters with low precision values, for example, binary numbers, model sizes can be dramatically reduced without changing the DNN architecture [32], [57], [73].…”