A dialogue is essentially a multi-turn interaction among interlocutors. Effective evaluation metrics should reflect the dynamics of such interaction. Existing automatic metrics are focused very much on the turn-level quality, while ignoring such dynamics. To this end, we propose DynaEval 1 , a unified automatic evaluation framework which is not only capable of performing turn-level evaluation, but also holistically considers the quality of the entire dialogue. In DynaEval, the graph convolutional network (GCN) is adopted to model a dialogue in totality, where the graph nodes denote each individual utterance and the edges represent the dependency between pairs of utterances. A contrastive loss is then applied to distinguish well-formed dialogues from carefully constructed negative samples. Experiments show that DynaEval significantly outperforms the state-of-the-art dialogue coherence model, and correlates strongly with human judgements across multiple dialogue evaluation aspects at both turn and dialogue level.
Code-switch language modeling is challenging due to data scarcity as well as expanded vocabulary that involves two languages. We present a novel computational method to generate synthetic code-switch data using the Matrix Language Frame theory to alleviate the issue of data scarcity. The proposed method makes use of augmented parallel data to supplement the real code-switch data. We use the synthetic data to pre-train the language model. We show that the pre-trained language model can match the performance of vanilla models when it is finetuned with 2.5 times less real code-switch data. We also show that the perplexity of a RNN based language model pre-trained on synthetic code-switch data and fine-tuned with real codeswitch data is significantly lower than that of the model trained on real code-switch data alone and the reduction in perplexity translates into 1.45% absolute reduction in WER in a speech recognition experiment.
In artistic gymnastics, difficulty score or D-score is used for judging performance. Starting from zero, an athlete earns points from different aspects such as composition requirement, difficulty, and connection between moves. The final score is a composition of the quality of various performance indicators. Similarly, when evaluating dialogue responses, human judges generally follow a number of criteria, among which language fluency, context coherence, logical consistency, and semantic appropriateness are on top of the agenda. In this paper, we propose an automatic dialogue evaluation framework called D-score that resembles the way gymnastics is evaluated. Following the four human judging criteria above, we devise a range of evaluation tasks and model them under a multi-task learning framework. The proposed framework, without relying on any human-written reference, learns to appreciate the overall quality of human-human conversations through a representation that is shared by all tasks without over-fitting to individual task domain. We evaluate D-score by performing comprehensive correlation analyses with human judgement on three dialogue evaluation datasets, among which two are from past DSTC series, and benchmark against state-of-the-art baselines. D-score not only outperforms the best baseline by a large margin in terms of system-level Spearman correlation but also represents an important step towards explainable dialogue scoring.
Despite the significant progress in end-to-end (E2E) automatic speech recognition (ASR), E2E ASR for low resourced code-switching (CS) speech has not been well studied. In this work, we describe an E2E ASR pipeline for the recognition of CS speech in which a low-resourced language is mixed with a high resourced language. Low-resourcedness in acoustic data hinders the performance of E2E ASR systems more severely than the conventional ASR systems. To mitigate this problem in the transcription of archives with code-switching Frisian-Dutch speech, we integrate a designated decoding scheme and perform rescoring with neural network-based language models to enable better utilization of the available textual resources. We first incorporate a multi-graph decoding approach which creates parallel search spaces for each monolingual and mixed recognition tasks to maximize the utilization of the textual resources from each language. Further, language model rescoring is performed using a recurrent neural network pre-trained with cross-lingual embedding and further adapted with the limited amount of in-domain CS text. The ASR experiments demonstrate the effectiveness of the described techniques in improving the recognition performance of an E2E CS ASR system in a low-resourced scenario.
Language modeling is the technique to estimate the probability of a sequence of words. A bilingual language model is expected to model the sequential dependency for words across languages, which is difficult due to the inherent lack of suitable training data as well as diverse syntactic structure across languages. We propose a bilingual attention language model (BALM) that simultaneously performs language modeling objective with a quasi-translation objective to model both the monolingual as well as the cross-lingual sequential dependency. The attention mechanism learns the bilingual context from a parallel corpus. BALM achieves state-of-theart performance on the SEAME code-switch database by reducing the perplexity of 20.5% over the best-reported result. We also apply BALM in bilingual lexicon induction, and language normalization tasks to validate the idea.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.