“…Teacher-student (T/S) learning [1,2] has been widely applied to a variety of deep learning tasks in speech, language and image processing including model compression [1,2], domain adaptation [3,4,5], small-footprint natural machine translation (NMT) [6], low-resource NMT [7], far-field automatic speech recognition (ASR) [8,9], lowresource language ASR [10] and neural network pre-training [11]. T/S learning falls in the category of transfer learning, where the network of interest, as a student, is trained by mimicking the behavior of a well-trained network, as a teacher, in the presence of the same or stereo training samples.…”