Ruty Rinott scite author profile

State-of-the-art natural language processing systems rely on supervision in the form of annotated data to learn competent models. These models are generally trained on data in a single language (usually English), and cannot be directly used beyond that language. Since collecting data in every language is not realistic, there has been a growing interest in crosslingual language understanding (XLU) and low-resource cross-language transfer. In this work, we construct an evaluation set for XLU by extending the development and test sets of the Multi-Genre Natural Language Inference Corpus (MultiNLI) to 15 languages, including low-resource languages such as Swahili and Urdu. We hope that our dataset, dubbed XNLI, will catalyze research in cross-lingual sentence understanding by providing an informative standard evaluation task. In addition, we provide several baselines for multilingual sentence understanding, including two based on machine translation systems, and two that use parallel data to train aligned multilingual bag-of-words and LSTM encoders. We find that XNLI represents a practical and challenging evaluation suite, and that directly translating the test data yields the best performance among available baselines.

show abstract

MLQA: Evaluating Cross-lingual Extractive Question Answering

Lewis¹,

Oğuz²,

Rinott³

et al. 2020

248

221

View full text Add to dashboard Cite

Question answering (QA) models have shown rapid progress enabled by the availability of large, high-quality benchmark datasets. Such annotated datasets are difficult and costly to collect, and rarely exist in languages other than English, making building QA systems that work well in other languages challenging. In order to develop such systems, it is crucial to invest in high quality multilingual evaluation benchmarks to measure progress. We present MLQA, a multi-way aligned extractive QA evaluation benchmark intended to spur research in this area. 1 MLQA contains QA instances in 7 languages, English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese. MLQA has over 12K instances in English and 5K in each other language, with each instance parallel between 4 languages on average. We evaluate stateof-the-art cross-lingual models and machinetranslation-based baselines on MLQA. In all cases, transfer results are significantly behind training-language performance.

show abstract

Show Me Your Evidence - an Automatic Method for Context Dependent Evidence Detection

Rinott¹,

Dankin²,

Perez³

et al. 2015

158

171

View full text Add to dashboard Cite

Engaging in a debate with oneself or others to take decisions is an integral part of our day-today life. A debate on a topic (say, use of performance enhancing drugs) typically proceeds by one party making an assertion/claim (say, PEDs are bad for health) and then providing an evidence to support the claim (say, a 2006 study shows that PEDs have psychiatric side effects). In this work, we propose the task of automatically detecting such evidences from unstructured text that support a given claim. This task has many practical applications in decision support and persuasion enhancement in a wide range of domains. We first introduce an extensive benchmark data set tailored for this task, which allows training statistical models and assessing their performance. Then, we suggest a system architecture based on supervised learning to address the evidence detection task. Finally, promising experimental results are reported.

show abstract

A Benchmark Dataset for Automatic Detection of Claims and Evidence in the Context of Controversial Topics

Aharoni¹,

Polnarov²,

Lavee³

et al. 2014

101

106

View full text Add to dashboard Cite

We describe a novel and unique argumentative structure dataset. This corpus consists of data extracted fro m hundreds of Wikipedia articles using a meticulously monitored manual annotation process. The result is 2,683 argument elements, collected in the context of 33 controversial topics, organized under a simp le claim-evidence structure. The obtained data are publicly available for academic research.

show abstract

XNLI: Evaluating Cross-lingual Sentence Representations

Conneau¹,

Lample²,

Rinott³

et al. 2018

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ruty Rinott

XNLI: Evaluating Cross-lingual Sentence Representations

MLQA: Evaluating Cross-lingual Extractive Question Answering

Show Me Your Evidence - an Automatic Method for Context Dependent Evidence Detection

A Benchmark Dataset for Automatic Detection of Claims and Evidence in the Context of Controversial Topics

XNLI: Evaluating Cross-lingual Sentence Representations

Contact Info

Product

Resources

About