Arul Menezes scite author profile

In this work, we study hallucinations in Neural Machine Translation (NMT), which lie at an extreme end on the spectrum of NMT pathologies. Firstly, we connect the phenomenon of hallucinations under source perturbation to the Long-Tail theory of Feldman (2020), and present an empirically validated hypothesis that explains hallucinations under source perturbation. Secondly, we consider hallucinations under corpus-level noise (without any source perturbation) and demonstrate that two prominent types of natural hallucinations (detached and oscillatory outputs) could be generated and explained through specific corpus-level noise patterns. Finally, we elucidate the phenomenon of hallucination amplification in popular data-generation processes such as Backtranslation and sequence-level Knowledge Distillation. We have released the datasets and code to replicate our results at https://github.com/vyraun/ hallucinations.

show abstract

A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora

Menezes

Richardson

2001

View full text Add to dashboard Cite

Translation systems that automatically extract transfer mappings (rules or examples) from bilingual corpora have been hampered by the difficulty of achieving accurate alignment and acquiring high quality mappings. We describe an algorithm that uses a bestfirst strategy and a small alignment grammar to significantly improve the quality of the transfer mappings extracted.For each mapping, frequencies are computed and sufficient context is retained to distinguish competing mappings during translation. Variants of the algorithm are run against a corpus containing 200K sentence pairs and evaluated based on the quality of resulting translations.

show abstract

An AMR parser for English, French, German, Spanish and Japanese and a new AMR-annotated corpus

Vanderwende¹,

Menezes²,

Quirk³

2015

View full text Add to dashboard Cite

In this demonstration, we will present our online parser 1 that allows users to submit any sentence and obtain an analysis following the specification of AMR (Banarescu et al., 2014) to a large extent. This AMR analysis is generated by a small set of rules that convert a native Logical Form analysis provided by a preexisting parser (see Vanderwende, 2015) into the AMR format. While we demonstrate the performance of our AMR parser on data sets annotated by the LDC, we will focus attention in the demo on the following two areas: 1) we will make available AMR annotations for the data sets that were used to develop our parser, to serve as a supplement to the LDC data sets, and 2) we will demonstrate AMR parsers for German, French, Spanish and Japanese that make use of the same small set of LF-to-AMR conversion rules. IntroductionAbstract Meaning Representation (AMR) (Banarescu et al., 2014) is a semantic representation for which a large amount of manually-annotated data is being created, with the intent of constructing and evaluating parsers that generate this level of semantic representation for previously unseen text.1 Available at: http://research.microsoft.com/msrsplat Already one method for training an AMR parser has appeared in (Flanigan et al., 2014), and we anticipate that more attempts to train parsers will follow. In this demonstration, we will present our AMR parser, which converts our existing semantic representation formalism, Logical Form (LF), into the AMR format. We do this with two goals: first, as our existing LF is close in design to AMR, we can now use the manually-annotated AMR datasets to measure the accuracy of our LF system, which may serve to provide a benchmark for parsers trained on the AMR corpus. We gratefully acknowledge the contributions made by Banarescu et al. (2014) towards defining a clear and interpretable semantic representation that enables this type of system comparison. Second, we wish to contribute new AMR data sets comprised of the AMR annotations by our AMR parser of the sentences we previously used to develop our LF system. These sentences were curated to cover a widerange of syntactic-semantic phenomena, including those described in the AMR specification. We will also demonstrate the capabilities of our parser to generate AMR analyses for sentences in French, German, Spanish and Japanese, for which no manually-annotated AMR data is available at present. Abstract Meaning RepresentationAbstract Meaning Representation (AMR) is a semantic representation language which aims to assign the same representation to sentences that have 26

show abstract

Dependency treelet translation: the convergence of statistical and example-based machine-translation?

Quirk

Menezes

2007

Machine Translation

View full text Add to dashboard Cite

We describe a novel approach to MT that combines the strengths of the two leading corpus-based approaches: Phrasal SMT and EBMT. We use a syntactically informed decoder and reordering model based on the source dependency tree, in combination with conventional SMT models to incorporate the power of phrasal SMT with the linguistic generality available in a parser. We show that this approach significantly outperforms a leading string-based Phrasal SMT decoder and an EBMT system. We present results from two radically different language pairs, and investigate the sensitivity of this approach to parse quality by using two distinct parsers and oracle experiments. We also validate our automated bleu scores with a small human evaluation.

show abstract

Effectively using syntax for recognizing false entailment

Snow

Vanderwende

Menezes

2006

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Arul Menezes

The Curious Case of Hallucinations in Neural Machine Translation

A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora

An AMR parser for English, French, German, Spanish and Japanese and a new AMR-annotated corpus

Dependency treelet translation: the convergence of statistical and example-based machine-translation?

Effectively using syntax for recognizing false entailment

Contact Info

Product

Resources

About