“…By adapting to quantization operation through relearning the weights, the QAT leads to a model quantized to an ultralow bit width maintaining its performance, although it relies on complete training datasets. The main research contents of QAT contain the quantizer design [26], training strategy [27], approximate gradient [28], and binary network [29]. In contrast, PTQ realizes the model quantization employing quite limited data.…”