“…To deal with the issue of large footprint models as using complex network architectures, ensemble of multiple models, or ensemble of multiple spectrogram inputs, pruning [21], [25], [22], [26] and quantization [25], [23] techniques have been widely applied. While quantization techniques feasibly help the model reduce to 1/4 of the original size (i.e, 32 bit with floating point format presenting for 1 trainable parameter is quantized to 8 bit with integer format [27]), pruning techniques prove that models can be reduced to 1/10 of the original sizes [25], [26].…”