2022
DOI: 10.1007/s11063-022-10770-4
|View full text |Cite
|
Sign up to set email alerts
|

Co-Learning for Few-Shot Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(11 citation statements)
references
References 22 publications
0
11
0
Order By: Relevance
“…performance for 5M , 10M , 30M model sizes without additional training, and (iii) improves or performs similarly to AutoDistil[22] for 5M , 26.8M , and 50M model sizes. Compared to HAT[19], our best supernet: (i) reduces the supernet vs. the standalone model gap by 26.5%, (ii) yields a better pareto front for latency-BLEU tradeoff (100 to 200 ms), and (iii) reduces the number of additional steps to close the gap by 39.8%.…”
mentioning
confidence: 82%
See 4 more Smart Citations
“…performance for 5M , 10M , 30M model sizes without additional training, and (iii) improves or performs similarly to AutoDistil[22] for 5M , 26.8M , and 50M model sizes. Compared to HAT[19], our best supernet: (i) reduces the supernet vs. the standalone model gap by 26.5%, (ii) yields a better pareto front for latency-BLEU tradeoff (100 to 200 ms), and (iii) reduces the number of additional steps to close the gap by 39.8%.…”
mentioning
confidence: 82%
“…BERT supernet and standalone are pretrained from scratch on Wikipedia and Books Corpus [30]. We evaluate the performance of the BERT model by finetuning on each of the seven tasks (chosen by AutoDistil [22]) in the GLUE benchmark [18]. The data preprocessing, pretraining settings, and finetuning settings are discussed in A.1.1.…”
Section: Experiments Setupmentioning
confidence: 99%
See 3 more Smart Citations