A Hitchhiker's guide to statistical tests for assessing randomized algorithms in software engineering

Arcuri, Andrea; Briand, Lionel C.

doi:10.1002/stvr.1486

Cited by 514 publications

(370 citation statements)

References 118 publications

Supporting

Mentioning

368

Contrasting

Order By: Relevance

“…4 9.5 = 7.5 ∼ 26.7 3.3 = 8.1 are rather similar. 4 Notice that Q3 yields κ = 0.09 and κ = 0.39 for standalone experiments and "experiments as evaluations", respectively. Random sampling is a controversial issue in SE.…”

Section: Discussionmentioning

confidence: 75%

“…However, the low p-values in both the χ 2 and the Fisher's Exact Test suggest that Q3, Q4, Q5, Q10 could achieve statistical significance with larger samples. In all cases, standalone experiments perform random selection (Q3 4 , random assignment (Q4), assumption checking (Q5) and reporting of descriptive statistics (Q10) more frequently than "experiments as evaluations". Di erences are not so large as in the case of Q1.1 and Q1.2, but still substantial, e.g., 61.9% vs. 13.3% for Q5.…”

Section: Survey Resultsmentioning

confidence: 99%

“…For instance, Kitchenham's paper introduced robust statistical methods [42], while Arcuri and Briand's paper discussed statistical tests for the assessment of randomized algorithms [4]. ese works do not assess the weaknesses in current research.…”

Section: Statistical Errors In Sementioning

confidence: 99%

See 2 more Smart Citations

Statistical Errors in Software Engineering Experiments: A Preliminary Literature Review

Reyes

Dieste

Fonseca

et al. 2018

EasyChair Preprints

View full text Add to dashboard Cite

Background: Statistical concepts and techniques are o en applied incorrectly, even in mature disciplines such as medicine or psychology. Surprisingly, there are very few works that study statistical problems in so ware engineering (SE). Aim: Assess the existence of statistical errors in SE experiments. Method: Compile the most common statistical errors in experimental disciplines. Survey experiments published in ICSE to assess whether errors occur in high quality SE publications. Results: e same errors identi ed in others disciplines were found in ICSE experiments, exhibiting rather large prevalences, over 30% of the reviewed papers in several types of errors such as: a) Missing statistical hypotheses, b) missing sample size calculation, c) failure to assess statistical tests assumptions, and d) uncorrected multiple testing. When experiments restrict to the validation section of a larger research paper, the prevalence of errors increases.e origin of the errors can be traced back to: a) Researchers' inadequate statistical training, and, b) abundance of exploratory research. Conclusions: is paper provides preliminary evidence that SE research su ers the same statistical problems than other experimental disciplines. However, SE community does not seem to be aware of the existence of shortcomings in their experiments, whereas other disciplines work hard to avoid them. Further research is necessary to nd the underlying causes and set corrective measures, but at the outset some actions could be e ective: a) Improve the statistical training of SE researchers, and b) enforce quality assessment and reporting guidelines in SE publications. CCS CONCEPTS•General and reference →Surveys and overviews;

show abstract

Section: Discussionmentioning

confidence: 75%

Section: Survey Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Statistical Errors in Software Engineering Experiments: A Preliminary Literature Review

Reyes

Dieste

Fonseca

et al. 2018

EasyChair Preprints

View full text Add to dashboard Cite

show abstract

“…First of all, the experiments should all run on the same hardware and runtime environment, using comparable configurations (e.g., in terms of timeouts). Techniques using randomization, such as jGenProg, require several repeated runs to get to quantitative results that are representative of a typical run [1]. Some techniques, such as ACS and HDA, rely on a time-consuming preprocessing stage that mines code repositories (and is crucial for effectiveness), and hence it is unclear how to appropriately compare them to techniques, such as JAID, that do not depend on this auxiliary information.…”

Section: E Threats To Validitymentioning

confidence: 99%

“…Unfortunately, even such simple contracts are hardly ever available in the most widely used programming languages. 1 Can we still generalize some of the techniques used for contract-based program repair to work effectively without userwritten contracts?…”

Section: Introductionmentioning

confidence: 99%

Contract-based program repair without the contracts

Chen

Pei

Furia

2017

2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)

106

View full text Add to dashboard Cite

Abstract-Automated program repair (APR) is a promising approach to automatically fixing software bugs. Most APR techniques use tests to drive the repair process; this makes them readily applicable to realistic code bases, but also brings the risk of generating spurious repairs that overfit the available tests. Some techniques addressed the overfitting problem by targeting code using contracts (such as pre-and postconditions), which provide additional information helpful to characterize the states of correct and faulty computations; unfortunately, mainstream programming languages do not normally include contract annotations, which severely limits the applicability of such contract-based techniques.This paper presents JAID, a novel APR technique for Java programs, which is capable of constructing detailed state abstractions-similar to those employed by contract-based techniques-that are derived from regular Java code without any special annotations. Grounding the repair generation and validation processes on rich state abstractions mitigates the overfitting problem, and helps extend APR's applicability: in experiments with the DEFECTS4J benchmark, a prototype implementation of JAID produced genuinely correct repairs, equivalent to those written by programmers, for 25 bugs-improving over the state of the art of comparable Java APR techniques in the number and kinds of correct fixes.

show abstract

Bio‐inspired optimization to support the test data generation of concurrent software

Vilela

Neto

Pinto³

et al. 2022

Concurrency and Computation

View full text Add to dashboard Cite

Concurrent programming is increasingly present in modern applications. Although it provides higher performance and better use of available resources, the mechanisms of interaction between processes/threads result in a greater challenge for software testing activity. The nondeterminism present in those applications is one of the main issues during the test activity since the same test input can produce different possible execution paths, which may or not contain defects. The test data automatic generation can alleviate this problem, ensuring higher speed and reliability in software testing activity. This paper explores the automatic test data generation for concurrent programs through Genetic Algorithm, a bioinspired optimization technique, and proposes a test data generation approach for concurrent programs, called BioConcST, and a new operator for the selection of test subjects, called FuzzyST, which uses fuzzy logic. The approaches were evaluated in an experimental study towards their validation. The results showed that BioConcST is more promising than the other approaches at all analyzed levels. FuzzyST, together with Elitism and Tournament operators, provided the best results; however, it proved more suitable for concurrent programs of higher complexity. K E Y W O R D Sconcurrent software testing, search-based software testing, test data generation INTRODUCTIONAlthough most people may not know it, all current computer users and smartphones use concurrent software daily. Operating systems (OSs), included in such devices, employ concurrent programming mechanisms to manage the scheduling and execution of processes, whether in mono-processed environments, with pseudo-parallelism, or in multi-processed ones, where there is parallel execution between processes.The first contribution of concurrent programming to operating systems was proposed by Dijkstra, 1 who aimed to identify and resolve mutual exclusion for preventing a simultaneous access to a shared resource, also called a critical region. This and other mechanisms of concurrent programming have enabled Operational Systems to optimize the use of computational resources and, consequently, improve their performance.Presently, concurrent programming is no longer restricted to Operating Systems, since modern applications also demand a better use of resources and higher performance in the execution of routines and tasks. As an example, parallel programming and distributed programming, branches of concurrent programming, have led to the emergence of new technologies widely used, such as Web services and Cloud computing, which employ concepts of concurrent programming to provide communication and optimization mechanisms of services. 2,3

show abstract

A Hitchhiker's guide to statistical tests for assessing randomized algorithms in software engineering

Cited by 514 publications

References 118 publications

Statistical Errors in Software Engineering Experiments: A Preliminary Literature Review

Statistical Errors in Software Engineering Experiments: A Preliminary Literature Review

Contract-based program repair without the contracts

Bio‐inspired optimization to support the test data generation of concurrent software

Contact Info

Product

Resources

About