2021
DOI: 10.1609/aaai.v35i9.16988
|View full text |Cite
|
Sign up to set email alerts
|

GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning

Abstract: Large scale machine learning and deep models are extremely data-hungry. Unfortunately, obtaining large amounts of labeled data is expensive, and training state-of-the-art models (with hyperparameter tuning) requires significant computing resources and time. Secondly, real-world data is noisy and imbalanced. As a result, several recent papers try to make the training process more efficient and robust. However, most existing work either focuses on robustness or efficiency, but not both. In this work, we introduc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
79
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 75 publications
(79 citation statements)
references
References 21 publications
0
79
0
Order By: Relevance
“…A combination of the above techniques would also be an exciting future research direction, primarily since many attributes are somehow related, e.g., principal components generated from the same original features or all the columns in databases referring to the same source of knowledge. Combining these techniques with stateof-the-art instance selection and training set selection methods would also be of great importance [42].…”
Section: Solution Overviewmentioning
confidence: 99%
“…A combination of the above techniques would also be an exciting future research direction, primarily since many attributes are somehow related, e.g., principal components generated from the same original features or all the columns in databases referring to the same source of knowledge. Combining these techniques with stateof-the-art instance selection and training set selection methods would also be of great importance [42].…”
Section: Solution Overviewmentioning
confidence: 99%
“…Closely related to our method Clinical, are methods which optimize an objective that involves a held-out set. GradMatch [13] uses an orthogonal matching pursuit algorithm to select a subset whose gradient closely matches the gradient of a validation set. Another method, Glister-Active [14] formulates an acquisition function that maximizes the log-likelihood on a held-out validation set.…”
Section: Related Workmentioning
confidence: 99%
“…GradMatch [13] uses an orthogonal matching pursuit algorithm to select a subset whose gradient closely matches the gradient of a validation set. Another method, Glister-Active [14] formulates an acquisition function that maximizes the log-likelihood on a held-out validation set. We adopt GradMatch and Glister-Active as baselines that targets rare classes in our class imbalance setting and refer to it T-GradMatch and T-Glister in Sec.…”
Section: Related Workmentioning
confidence: 99%
“…Thus, storing all the instance-wise loss gradients at once is not feasible for RNN-T systems. Killamsetty et al (2021a) lects mini-batches (like used in SGD) instead of individual instances. Reduction in memory by using this technique is also not much for ASR systems such as RNN-T, since batch size used here is often small.…”
Section: Limitations Of Existing Subset Selection Algorithmsmentioning
confidence: 99%
“…One way to make ASR training more efficient is to train on a subset of the training data, which ensures minimum performance loss (Killamsetty et al, 2021a;Wei et al, 2014;Kaushal et al, 2019;Coleman et al, 2020;Har-Peled and Mazumdar, 2004;Clarkson, 2010;Killamsetty et al, 2021b;Liu et al, 2017). Since training on a subset reduces end-to-end time, the hyperparameter tuning time is also reduced.…”
Section: Introductionmentioning
confidence: 99%