Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1266
|View full text |Cite
|
Sign up to set email alerts
|

Deep Dominance - How to Properly Compare Deep Neural Models

Abstract: Comparing between Deep Neural Network (DNN) models based on their performance on unseen data is crucial for the progress of the NLP field. However, these models have a large number of hyper-parameters and, being non-convex, their convergence point depends on the random values chosen at initialization and during training. Proper DNN comparison hence requires a comparison between their empirical score distributions on unseen data, rather than between single evaluation scores as is standard for more simple, conve… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
61
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 73 publications
(61 citation statements)
references
References 21 publications
0
61
0
Order By: Relevance
“…Deep neural networks' performance on NLP tasks is bound to exhibit large variance. Reimers and Gurevych (2017) and Dror et al (2019) stress the importance of reporting score distributions instead of a single score for fair(er) comparisons. Dodge et al (2020), Mosbach et al (2021), andZhang et al (2021) show that finetuning pretrained encoders with different random seeds yields performance with large variance.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Deep neural networks' performance on NLP tasks is bound to exhibit large variance. Reimers and Gurevych (2017) and Dror et al (2019) stress the importance of reporting score distributions instead of a single score for fair(er) comparisons. Dodge et al (2020), Mosbach et al (2021), andZhang et al (2021) show that finetuning pretrained encoders with different random seeds yields performance with large variance.…”
Section: Background and Related Workmentioning
confidence: 99%
“…The results show that our structural KD approaches outperform the baselines in all the cases. Table 3 Dror et al (2019) with a significance level of 0.05 and find that the advantages of our structural KD approaches are significant. Please refer to Appendix for more detailed results.…”
Section: Resultsmentioning
confidence: 95%
“…In this section, we present detailed experimental results. (Dror et al, 2019), which is a high quality comparison between deep neural networks. We evaluate with a significance level of 0.05.…”
Section: Detailed Experimental Resultsmentioning
confidence: 99%
“…We also notice that the accuracy increment is relatively higher for all experiments on the WOS corpus than on DBpedia. A primary reason might be the number of documents in each dataset, as (Dror et al, 2019) over the seq2seq baseline with a significance level of 0.05. The amount of parameters of each combined strategies is up to seven million.…”
Section: Resultsmentioning
confidence: 99%