Recent progress in semantic parsing scarcely considers languages other than English but professional translation can be prohibitively expensive. We adapt a semantic parser trained on a single language, such as English, to new languages and multiple domains with minimal annotation. We query if machine translation is an adequate substitute for training data, and extend this to investigate bootstrapping using joint training with English, paraphrasing, and multilingual pre-trained models. We develop a Transformer-based parser combining paraphrases by ensembling attention over multiple encoders and present new versions of ATIS and Overnight in German and Chinese for evaluation. Experimental results indicate that MT can approximate training data in a new language for accurate parsing when augmented with paraphrasing through multiple MT engines. Considering when MT is inadequate, we also find that using our approach achieves parsing accuracy within 2% of complete translation using only 50% of training data. 1
Nonlinearity mitigation using digital signal processing has been shown to increase the achievable data rates of optical fiber transmission links. One especially effective technique is digital back propagation (DBP), an algorithm capable of simultaneously compensating for linear and nonlinear channel distortions. The most significant barrier to implementing this technique, however, is its high computational complexity. In recent years, there have been several proposed alternatives to DBP with reduced computational complexity, although such techniques have not demonstrated performance benefits commensurate with the complexity of implementation. In order to fully characterize the computational requirements of DBP, there is a need to model the algorithm behavior when constrained to the logic used in a digital coherent receiver. Such a model allows for the analysis of any signal recovery algorithm in terms of true hardware complexity which, crucially, includes the bit-depth of the multiplication operation. With a limited bit depth, there is quantization noise, introduced with each arithmetic operation, and it can no longer be assumed that the conventional DBP algorithm will outperform its low complexity alternatives. In this work, DBP and a single nonlinear step DBP implementation, the Enhanced Split Step Fourier method (ESSFM), were compared with linear equalization using a generic software model of fixed point hardware. The requirements of bit depth and fast Fourier transform (FFT) size are discussed to examine the optimal operating regimes for these two schemes of digital nonlinearity compensation. For a 1000 km transmission system, it was found that (assuming an optimized FFT size), in terms of SNR, the ESSFM algorithm outperformed the conventional DBP for all hardware resolutions up to 13 bits.
Recent work in crosslingual semantic parsing has successfully applied machine translation to localize accurate parsing to new languages. However, these advances assume access to high-quality machine translation systems, and tools such as word aligners, for all test languages. We remove these assumptions and study cross-lingual semantic parsing as a zeroshot problem without parallel data for 7 test languages (DE, ZH, FR, ES, PT, HI, TR). We propose a multi-task encoder-decoder model to transfer parsing knowledge to additional languages using only English-Logical form paired data and unlabeled, monolingual utterances in each test language. We train an encoder to generate language-agnostic representations jointly optimized for generating logical forms or utterance reconstruction and against language discriminability. Our system frames zero-shot parsing as a latent-space alignment problem and finds that pre-trained models can be improved to generate logical forms with minimal cross-lingual transfer penalty. Experimental results on Overnight and a new executable version of MultiATIS++ find that our zero-shot approach performs above backtranslation baselines and, in some cases, approaches the supervised upper bound.
Automatic machine translation (MT) metrics are widely used to distinguish the quality of machine translation systems across large test sets (i.e., system-level evaluation). However, it is unclear if automatic metrics can reliably distinguish good translations from bad at the sentence level (i.e., segment-level evaluation). We investigate how useful MT metrics are at detecting segment-level quality by correlating metrics with the translation utility for downstream tasks. We evaluate the segment-level performance of widespread MT metrics (chrF, COMET, BERTScore, etc.) on three downstream cross-lingual tasks (dialogue state tracking, question answering, and semantic parsing). For each task, we have access to a monolingual task-specific model and a translation model. We calculate the correlation between the metric's ability to predict a good/bad translation with the success/failure on the final task for machine-translated test sentences. Our experiments demonstrate that all metrics exhibit negligible correlation with the extrinsic evaluation of downstream outcomes. We also find that the scores provided by neural metrics are not interpretable, in large part due to having undefined ranges. We synthesise our analysis into recommendations for future MT metrics to produce labels rather than scores for more informative interaction between machine translation and multilingual language understanding.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.