2020
DOI: 10.1007/978-3-030-63823-8_67
|View full text |Cite
|
Sign up to set email alerts
|

FTR-NAS: Fault-Tolerant Recurrent Neural Architecture Search

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 4 publications
1
3
0
Order By: Relevance
“…First, we can observe that the PET methods exhibit noticeable performance differences (standard deviation This phenomenon is intuitive and demonstrates the critical impact of design differences (the position and quantity of parameters in the tunable module) on the performance of PET methods. This finding has been consistently found in numerous prior works (Ding et al, 2023;Hu et al, 2022c). However, we find that as the model scaling increases (from BERT SMALL to BERT LARGE in the subfigure [a]; from BLOOM 560M to BLOOM 7.1B in the sub-figure [b]; from T5 SMALL to T5 XXL in the sub-figure [c]), the performance discrepancies among PET methods diminish across all types of models, as evidenced by the decreasing standard deviation (S.D.)…”
Section: Model Scaling Impact On Pet Methodssupporting
confidence: 87%
See 3 more Smart Citations
“…First, we can observe that the PET methods exhibit noticeable performance differences (standard deviation This phenomenon is intuitive and demonstrates the critical impact of design differences (the position and quantity of parameters in the tunable module) on the performance of PET methods. This finding has been consistently found in numerous prior works (Ding et al, 2023;Hu et al, 2022c). However, we find that as the model scaling increases (from BERT SMALL to BERT LARGE in the subfigure [a]; from BLOOM 560M to BLOOM 7.1B in the sub-figure [b]; from T5 SMALL to T5 XXL in the sub-figure [c]), the performance discrepancies among PET methods diminish across all types of models, as evidenced by the decreasing standard deviation (S.D.)…”
Section: Model Scaling Impact On Pet Methodssupporting
confidence: 87%
“…Unified View of PET Methods Positions of Tunable Modules θ = {W 1 , W 2 , ..., W p } Prompt (Lester et al, 2021) h out = f (h in ) + ∆h W will be concatenated to input hidden states Adapter (Houlsby et al, 2019a) W will be plugged between SelfAttn./FFN. layers LoRA (Hu et al, 2022a) W will be plugged into SelfAttn layers BitFit (Ben Zaken et al, 2022) W will be add into Bias terms Hu et al (2022c). Each PET method has p tunable weights W in designed positions.…”
Section: Pet Methodsmentioning
confidence: 99%
See 2 more Smart Citations