Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for each layer: it requires domain experts to explore the vast design space trading off accuracy, latency, energy, and model size, which is both time-consuming and usually sub-optimal. There are plenty of specialized hardware accelerators for neural networks, but little research has been done to design specialized neural networks optimized for a particular hardware accelerator. The latter is demanding given the much longer design cycle of silicon than neural nets. Conventional quantization algorithms ignore the different hardware architectures and quantize all the layers in a uniform way. In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework that automatically determines the quantization policy, and we take the hardware accelerator's feedback in the design loop. Rather than relying on proxy signals such as FLOPs and model