iDFlakies: A Framework for Detecting and Partially Classifying Flaky Tests

Lam, Wing; Oei, Reed; Shi, August; Marinov, Darko; Xie, Tao

doi:10.1109/icst.2019.00038

Cited by 121 publications

(101 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Regression testing algorithms analyze one version of code to obtain metadata such as coverage or time information for every test so that they can compute specific orders for future versions. For each of our evaluation projects, we treat the version of the project used in the published dataset [44] as the latest version in a sequence of versions. In our evaluation, we define a "version" for a project as a particular commit that has a change for the module containing the OD test, and the change consists of code changes to a Java file.…”

Section: Methodsmentioning

confidence: 99%

“…To evaluate how often OD tests fail when using test prioritization and test parallelization algorithms, we execute these algorithms on the version immediately following firstVer, called the Second Version (denoted as secondVer). For test selection, we execute the algorithms on all versions after firstVer up to the latest version (the version from the dataset [44]). The versions that we use for each of our evaluation projects are available online [8].…”

Section: Methodsmentioning

confidence: 99%

“…Our evaluation projects consist of 11 modules from 8 Mavenbased Java projects. These 11 modules are a subset of modules from the comprehensive version of a published dataset of flaky tests [44]. We include all of the modules (from the dataset) that contain OD tests, except for eight modules that we exclude because they are either incompatible with the tool that we use to compute coverage information or they contain OD tests where the developers have already specified test orderings in which the OD tests should run in.…”

Section: Evaluation Projectsmentioning

confidence: 99%

“…One prominent type of flaky tests is order-dependent (OD) tests [44,47,72]. An OD test is a test that passes or fails depending only on the order in which the test is run.…”

Section: Introductionmentioning

confidence: 99%

“…Based on prior literature, we implement 4 test prioritization, 6 test selection, and 2 test parallelization algorithms 1 . We apply each algorithm to 11 Java modules from 8 projects, obtained from a prior dataset of OD tests [44]. Each module contains tests written by developers, which we refer to as human-written tests, and at least one test in the module is an OD test.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Dependent-test-aware regression testing techniques

Lam

Shi

Oei

et al. 2020

Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis

Self Cite

View full text Add to dashboard Cite

Developers typically rely on regression testing techniques to ensure that their changes do not break existing functionality. Unfortunately, these techniques suffer from flaky tests, which can both pass and fail when run multiple times on the same version of code and tests. One prominent type of flaky tests is order-dependent (OD) tests, which are tests that pass when run in one order but fail when run in another order. Although OD tests may cause flaky-test failures, OD tests can help developers run their tests faster by allowing them to share resources. We propose to make regression testing techniques dependent-test-aware to reduce flaky-test failures. To understand the necessity of dependent-test-aware regression testing techniques, we conduct the first study on the impact of OD tests on three regression testing techniques: test prioritization, test selection, and test parallelization. In particular, we implement 4 test prioritization, 6 test selection, and 2 test parallelization algorithms, and we evaluate them on 11 Java modules with OD tests. When we run the orders produced by the traditional, dependent-test-unaware regression testing algorithms, 82% of human-written test suites and 100% of automatically-generated test suites with OD tests have at least one flaky-test failure. We develop a general approach for enhancing regression testing algorithms to make them dependent-test-aware, and apply our approach to 12 algorithms. Compared to traditional, unenhanced regression testing algorithms, the enhanced algorithms use provided test dependencies to produce orders with different permutations or extra tests. Our evaluation shows that, in comparison to the orders produced by unenhanced algorithms, the orders produced by enhanced algorithms (1) have overall 80% fewer flaky-test failures due to OD tests, and (2) may add extra tests but run only 1% slower on average. Our results suggest that enhancing regression testing algorithms to be dependent-test-aware can substantially reduce flaky-test failures with only a minor slowdown to run the tests.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Evaluation Projectsmentioning

confidence: 99%

“…One prominent type of flaky tests is order-dependent (OD) tests [44,47,72]. An OD test is a test that passes or fails depending only on the order in which the test is run.…”

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Dependent-test-aware regression testing techniques

Lam

Shi

Oei

et al. 2020

Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis

Self Cite

View full text Add to dashboard Cite

show abstract

Root causing, detecting, and fixing flaky tests: State of the art and future roadmap

et al. 2020

View full text Add to dashboard Cite

A flaky test is a test that may lead to different results in different runs on a single code under test without any change in the test code. Test flakiness is a noxious phenomenon that slows down software deployment, and increases the expenditures in a broad spectrum of platforms such as software‐defined networks and Internet of Things environments. Industrial institutes and labs have conducted a whole lot of research projects aiming at tackling this problem. Although this issue has been receiving more attention from academia in recent years, the academic research community is still behind the industry in this area. A systematic review and trend analysis on the existing approaches for detecting and root causing flaky tests can pave the way for future research on this topic. This can help academia keep pace with industrial advancements and even lead the research in this field. This article first presents a comprehensive review of recent achievements of the industry as well as academia regarding the detection and mitigation of flaky tests. In the next step, recent trends in this line of research are analyzed and a roadmap is established for future research.

show abstract

Empirical analysis of practitioners' perceptions of test flakiness factors

Ahmad

Leifler

Sandahl

2021

Software Testing Verif & Rel

View full text Add to dashboard Cite

Identifying the root causes of test flakiness is one of the challenges faced by practitioners during software testing. In other words, the testing of the software is hampered by test flakiness. Since the research about test flakiness in large-scale software engineering is scarce, the need for an empirical case-study where we can build a common and grounded understanding of the problem as well as relevant remedies that can later be evaluated in a large-scale context is a necessity. This study reports the findings from a multiple-case study. The authors conducted an online survey to investigate and catalogue the root causes of test flakiness and mitigation strategies. We attempted to understand how practitioners perceive test flakiness in closed-source development, such as how they define test flakiness and what practitioners perceive can affect test flakiness. The perceptions of practitioners were compared with the available literature. We investigated whether practitioners' perceptions are reflected in the test artefacts such as what is the relationship between the perceived factors and properties of test artefacts. This study reported 19 factors that are perceived by professionals to affect test flakiness. These perceived factors are categorized as test code, system under test, CI/test infrastructure, and organization-related. The authors concluded that some of the perceived factors in test flakiness in closed-source development are directly related to non-determinism, whereas other perceived factors concern different aspects, for example, lack of good properties of a test case, deviations from the established processes, and ad hoc decisions. Given a data set from investigated cases, the authors concluded that two of the perceived factors (i.e., test case size and test case simplicity) have a strong effect on test flakiness.flaky tests, non-deterministic tests, practitioners' perceptions, software testing, test smells | INTRODUCTIONRegression testing, automatic or manual, is intended to ensure that changes made in any part of the system do not break existing functionality. Developers submit code changes with the expectation that test failures will be associated with the code modifications. Unfortunately, rather than being the result of changes to the code, some test failures occur due to flaky tests. In the literature, the most common definition of a flaky test is: a test that exhibits both passing and failing outcomes when no changes are introduced into the code base [1]. King et al. extend this definition [2]: "flaky tests exhibit both passing and failing results when neither the code nor test has changed". Flaky tests are defined as "unreliable tests whose outcome is not deterministic."

show abstract

iDFlakies: A Framework for Detecting and Partially Classifying Flaky Tests

Cited by 121 publications

References 37 publications

Dependent-test-aware regression testing techniques

Dependent-test-aware regression testing techniques

Root causing, detecting, and fixing flaky tests: State of the art and future roadmap

Empirical analysis of practitioners' perceptions of test flakiness factors

Contact Info

Product

Resources

About