Face detection and recognition are being studied extensively for their vast applications in security, biometrics, healthcare, and marketing. As a step towards presenting an almost accurate solution to the problem in hand, this paper proposes a face detection and face recognition pipeline - face detection and recognition embedNet (FDREnet). The proposed FDREnet involves face detection through histogram of oriented gradients and uses Siamese technique and contrastive loss to train a deep learning architecture (EmbedNet). The approach allows the EmbedNet to learn how to distinguish facial features apart from recognizing them. This flexibility in learning due to contrastive loss accounts for better accuracy than using traditional deep learning losses. The dataset’s embeddings produced from the trained FDREnet result accuracy of 98.03%, 99.57% and 99.39% for face94, face95, and face96 datasets respectively through SVM clustering. Accuracy of 97.83%, 99.57%, and 99.39% was observed for face94, face95, and face96 datasets respectively through KNN clustering.
Spoken Language Understanding (SLU) systems parse speech into semantic structures like dialog acts and slots. This involves the use of an Automatic Speech Recognizer (ASR) to transcribe speech into multiple text alternatives (hypotheses). Transcription errors, common in ASRs, impact downstream SLU performance negatively. Approaches to mitigate such errors involve using richer information from the ASR, either in form of N-best hypotheses or word-lattices. We hypothesize that transformer models learn better with a simpler utterance representation using the concatenation of the N-best ASR alternatives, where each alternative is separated by a special delimiter [SEP]. In our work, we test our hypothesis by using concatenated N-best ASR alternatives as the input to transformer encoder models, namely BERT and XLM-RoBERTa, and achieve performance equivalent to the prior state-of-the-art model on DSTC2 dataset. We also show that our approach significantly outperforms the prior state-of-the-art when subjected to the low data regime. Additionally, this methodology is accessible to users of third-party ASR APIs which do not provide word-lattice information.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.