2012
DOI: 10.1002/stvr.1486
|View full text |Cite
|
Sign up to set email alerts
|

A Hitchhiker's guide to statistical tests for assessing randomized algorithms in software engineering

Abstract: 3Randomized algorithms are widely used to address many types of software engineering problems, espe-4 cially in the area of software verification and validation with a strong emphasis on test automation. However, 5 randomized algorithms are affected by chance, and so require the use of appropriate statistical tests to be

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
368
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 514 publications
(370 citation statements)
references
References 118 publications
2
368
0
Order By: Relevance
“…4 9.5 = 7.5 ∼ 26.7 3.3 = 8.1 are rather similar. 4 Notice that Q3 yields κ = 0.09 and κ = 0.39 for standalone experiments and "experiments as evaluations", respectively. Random sampling is a controversial issue in SE.…”
Section: Discussionmentioning
confidence: 75%
See 2 more Smart Citations
“…4 9.5 = 7.5 ∼ 26.7 3.3 = 8.1 are rather similar. 4 Notice that Q3 yields κ = 0.09 and κ = 0.39 for standalone experiments and "experiments as evaluations", respectively. Random sampling is a controversial issue in SE.…”
Section: Discussionmentioning
confidence: 75%
“…However, the low p-values in both the χ 2 and the Fisher's Exact Test suggest that Q3, Q4, Q5, Q10 could achieve statistical significance with larger samples. In all cases, standalone experiments perform random selection (Q3 4 , random assignment (Q4), assumption checking (Q5) and reporting of descriptive statistics (Q10) more frequently than "experiments as evaluations". Di erences are not so large as in the case of Q1.1 and Q1.2, but still substantial, e.g., 61.9% vs. 13.3% for Q5.…”
Section: Survey Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…First of all, the experiments should all run on the same hardware and runtime environment, using comparable configurations (e.g., in terms of timeouts). Techniques using randomization, such as jGenProg, require several repeated runs to get to quantitative results that are representative of a typical run [1]. Some techniques, such as ACS and HDA, rely on a time-consuming preprocessing stage that mines code repositories (and is crucial for effectiveness), and hence it is unclear how to appropriately compare them to techniques, such as JAID, that do not depend on this auxiliary information.…”
Section: E Threats To Validitymentioning
confidence: 99%
“…Unfortunately, even such simple contracts are hardly ever available in the most widely used programming languages. 1 Can we still generalize some of the techniques used for contract-based program repair to work effectively without userwritten contracts?…”
Section: Introductionmentioning
confidence: 99%