“…quantization [30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45], and through compression, i.e. sparsity/pruning [46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64], enables us to optimize the NN significantly. In quantization, low precision is used to represent the weights and activation, whereas in pruning the connections are completely removed.…”