2022
DOI: 10.48550/arxiv.2201.06494
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

AugLy: Data Augmentations for Robustness

Abstract: We introduce AugLy, a data augmentation library with a focus on adversarial robustness. AugLy provides a wide array of augmentations for multiple modalities (audio, image, text, & video). These augmentations were inspired by those that real users perform on social media platforms, some of which were not already supported by existing data augmentation libraries. AugLy can be used for any purpose where data augmentations are useful, but it is particularly wellsuited for evaluating robustness and systematically g… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
16
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 13 publications
(16 citation statements)
references
References 11 publications
0
16
0
Order By: Relevance
“…Overall, we find that perturbation augmentation can mitigate demographic bias during classification without any serious degradation to task performance for most tasks on the GLUE benchmark (see Zhang et al, 2021;Sen et al, 2021;Papakipos and Bitton, 2022). Data augmentation has been shown to improve out of domain generalization (Ng et al, 2020;Wang et al, 2021a), and some work even use learned methods for doing data augmentation for syntactic alternatives (Ross et al, 2022).…”
Section: Measuring Fairness With the Fairscorementioning
confidence: 72%
See 1 more Smart Citation
“…Overall, we find that perturbation augmentation can mitigate demographic bias during classification without any serious degradation to task performance for most tasks on the GLUE benchmark (see Zhang et al, 2021;Sen et al, 2021;Papakipos and Bitton, 2022). Data augmentation has been shown to improve out of domain generalization (Ng et al, 2020;Wang et al, 2021a), and some work even use learned methods for doing data augmentation for syntactic alternatives (Ross et al, 2022).…”
Section: Measuring Fairness With the Fairscorementioning
confidence: 72%
“…Is it necesssary to train a perturber, or can we just use heuristics? Previous approaches to perturbing data relied on heuristic methods to generate counterfactual data, such as swapping in words from word lists or designing handcrafted grammars to generate perturbations (Zmigrod et al, 2019;Renduchintala and Williams, 2022;Papakipos and Bitton, 2022). However, heuristic approaches suffer from several weaknesses and training a controlled generation seq2seq model allows us to improve on many of them.…”
Section: Problems Arising From Heuristic Perturbationmentioning
confidence: 99%
“…Many projects have proposed particular measurement templates, or prompts for the purpose of measuring bias, usually for large language models, (Rudinger et al, 2018;May et al, 2019;Sheng et al, 2019;Kurita et al, 2019;Webster et al, 2020;Gehman et al, 2020;Huang et al, 2020;Vig et al, 2020;Kirk et al, 2021a;Perez et al, 2022), and some even select existing sentences from text sources and swap demographic terms heuristically (Zhao et al, 2019;Wang et al, 2021;Papakipos and Bitton, 2022). Since one of our main contributions is the participatory assembly of a large set of demographic terms, our terms can be slotted into basically any templates to measure imbalances across demographic groups.…”
Section: Related Workmentioning
confidence: 99%
“…It includes a reference set of 1 million images, a development set of 50,000 augmented query images (a subset of which are transformed copies of a reference image), and a training set of 1 million images. About 60% of the query images in DISC21 have been transformed using image augmentations from the AugLy library [Papakipos and Bitton, 2022]. The remaining 40% have been manually edited by humans.…”
Section: The Isc Challengementioning
confidence: 99%