Lip reading has witnessed unparalleled development in recent years thanks to deep learning and the availability of large-scale datasets. Despite the encouraging results achieved, the performance of lip reading, unfortunately, remains inferior to the one of its counterpart speech recognition, due to the ambiguous nature of its actuations that makes it challenging to extract discriminant features from the lip movement videos. In this paper, we propose a new method, termed as Lip by Speech (LIBS), of which the goal is to strengthen lip reading by learning from speech recognizers. The rationale behind our approach is that the features extracted from speech recognizers may provide complementary and discriminant clues, which are formidable to be obtained from the subtle movements of the lips, and consequently facilitate the training of lip readers. This is achieved, specifically, by distilling multi-granularity knowledge from speech recognizers to lip readers. To conduct this cross-modal knowledge distillation, we utilize an efficacious alignment scheme to handle the inconsistent lengths of the audios and videos, as well as an innovative filtering strategy to refine the speech recognizer's prediction. The proposed method achieves the new state-of-the-art performance on the CMLR and LRS2 datasets, outperforming the baseline by a margin of 7.66% and 2.75% in character error rate, respectively.
Multi-range-false-target (MRFT) jamming is particularly challenging for tracking radar due to the dense clutter and the repeated multiple false targets. The conventional association-based multi-target tracking (MTT) methods suffer from high computational complexity and limited usage in the presence of MRFT jamming. In order to solve the above problems, an efficient and adaptable probability hypothesis density (PHD) filter is proposed. Based on the gating strategy, the obtained measurements are firstly classified into the generalized newborn target and the existing target measurements. The two categories of measurements are independently used in the decomposed form of the PHD filter. Meanwhile, an amplitude feature is used to suppress the dense clutter. In addition, an MRFT jamming suppression algorithm is introduced to the filter. Target amplitude information and phase quantization information are jointly used to deal with MRFT jamming and the clutter by modifying the particle weights of the generalized newborn targets. Simulations demonstrate the proposed algorithm can obtain superior correct discrimination rate of MRFT, and high-accuracy tracking performance with high computational efficiency in the presence of MRFT jamming in the dense clutter.
Lip reading has witnessed unparalleled development in recent years thanks to deep learning and the availability of largescale datasets. Despite the encouraging results achieved, the performance of lip reading, unfortunately, remains inferior to the one of its counterpart speech recognition, due to the ambiguous nature of its actuations that makes it challenging to extract discriminant features from the lip movement videos. In this paper, we propose a new method, termed as Lip by Speech (LIBS), of which the goal is to strengthen lip reading by learning from speech recognizers. The rationale behind our approach is that the features extracted from speech recognizers may provide complementary and discriminant clues, which are formidable to be obtained from the subtle movements of the lips, and consequently facilitate the training of lip readers. This is achieved, specifically, by distilling multigranularity knowledge from speech recognizers to lip readers. To conduct this cross-modal knowledge distillation, we utilize an efficacious alignment scheme to handle the inconsistent lengths of the audios and videos, as well as an innovative filtering strategy to refine the speech recognizer's prediction. The proposed method achieves the new state-of-the-art performance on the CMLR and LRS2 datasets, outperforming the baseline by a margin of 7.66% and 2.75% in character error rate, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.