Tristan Thrush scite author profile

We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking. Dynabench runs in a web browser and supports human-and-model-inthe-loop dataset creation: annotators seek to create examples that a target model will misclassify, but that another person will not. In this paper, we argue that Dynabench addresses a critical need in our community: contemporary models quickly achieve outstanding performance on benchmark tasks but nonetheless fail on simple challenge examples and falter in real-world scenarios. With Dynabench, dataset creation, model development, and model assessment can directly inform each other, leading to more robust and informative benchmarks. We report on four initial NLP tasks, illustrating these concepts and highlighting the promise of the platform, and address potential objections to dynamic benchmarking as a new standard for the field.

show abstract

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Scao¹,

Fan²,

Akiki³

et al. 2022

Preprint

123

View full text Add to dashboard Cite

Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection

Vidgen¹,

Thrush²,

Waseem³

2021

View full text Add to dashboard Cite

We present a human-and-model-in-the-loop process for dynamically generating datasets and training better performing and more robust hate detection models. We provide a new dataset of ∼40, 000 entries, generated and labelled by trained annotators over four rounds of dynamic data creation. It includes ∼15, 000 challenging perturbations and each hateful entry has fine-grained labels for the type and target of hate. Hateful entries make up 54% of the dataset, which is substantially higher than comparable datasets. We show that model performance is substantially improved using this approach. Models trained on later rounds of data collection perform better on test sets and are harder for annotators to trick. They also have better performance on HATECHECK, a suite of functional tests for online hate detection. We provide the code, dataset and annotation guidelines for other researchers to use.

show abstract

Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality

et al. 2022

View full text Add to dashboard Cite

Dynabench: Rethinking Benchmarking in NLP

Bartolo¹,

Nie²,

Kaushik³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tristan Thrush

Dynabench: Rethinking Benchmarking in NLP

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection

Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality

Dynabench: Rethinking Benchmarking in NLP

Contact Info

Product

Resources

About