2020
DOI: 10.2200/s00994ed1v01y202002hlt045
|View full text |Cite
|
Sign up to set email alerts
|

Statistical Significance Testing for Natural Language Processing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
7
2
1

Relationship

1
9

Authors

Journals

citations
Cited by 18 publications
(14 citation statements)
references
References 80 publications
0
14
0
Order By: Relevance
“…Notably, these improvements arise from training on merely three relations, meaning that the model improved its consistency ability and generalized to new relations. We measure the statistical significance of our method compared to the BERT baseline, using McNemar's test (following Dror et al [2018Dror et al [ , 2020) and find all results to be significant (p 0.01). We also perform an ablation study to quantify the utility of the different components.…”
Section: Improved Consistency Resultsmentioning
confidence: 99%
“…Notably, these improvements arise from training on merely three relations, meaning that the model improved its consistency ability and generalized to new relations. We measure the statistical significance of our method compared to the BERT baseline, using McNemar's test (following Dror et al [2018Dror et al [ , 2020) and find all results to be significant (p 0.01). We also perform an ablation study to quantify the utility of the different components.…”
Section: Improved Consistency Resultsmentioning
confidence: 99%
“…In order to show that the results are not coincidental, we test the statistical significance of our model. We follow the nonparametric Pitman's permutation test (Dror et al, 2018) and observe that our model is statistically significant when the significance level (α) is taken to be 0.05. Note that this holds true for all metric on both the datasets except ROUGE-2 on ParaNMT-small.…”
Section: Semantic Preservation Andmentioning
confidence: 99%
“…Notably, these improvements arise from training on merely three relations, meaning that the model improved its consistency ability and generalized to new relations. We measure the statistical significance of our method compared to the BERT baseline, using Mc-Nemar's test (following Dror et al (2018Dror et al ( , 2020) and find all results to be significant (pval 0.01). We also perform an ablation study to quantify the utility of the different components.…”
Section: Improved Consistency Resultsmentioning
confidence: 99%