“…One area uses uniform-precision quantization where the model shares the same precision Choukroun et al, 2019;Gong et al, 2019;Langroudi et al, 2019;Jin et al, 2020a;Bhalgat et al, 2020;Darvish Rouhani et al, 2020;Oh et al, 2021). Another direction studies mixed-precision that determines bit-width for each layer through search algorithms, aiming at better accuracy-efficiency trade-off (Dong et al, 2019;Wang et al, 2019;Habi et al, 2020;Fu et al, 2020;Yang & Jin, 2020;Zhao et al, 2021a;b;Ma et al, 2021b). There is also binarization network, which only applies 1-bit (Rastegari et al, 2016;Hubara et al, 2016;Cai et al, 2017;Bulat et al, 2020;.…”