Testing Container Classes: Random or Systematic?

Sharma, Rohan; Gligoric, Milos; Arcuri, Andrea; Fraser, Gordon; Marinov, Darko

doi:10.1007/978-3-642-19811-3_19

Cited by 44 publications

(37 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…When c is an ArrayList, contains performs a linear search (lines [18][19][20][21], which is inefficient, so it would have been better to iterate over c and call remove on the set because it has a more efficient inner loop. Indeed, the GCL developers changed their code, replacing the call to removeAll by conceptually inlining the body of removeAll and keeping only the then branch from the body.…”

Section: Category 4 (Inefficient Inner Loops)mentioning

confidence: 99%

See 1 more Smart Citation

Toddler: Detecting performance problems via similar memory-access patterns

Nistor¹,

Song²,

Marinov³

et al. 2013

2013 35th International Conference on Software Engineering (ICSE)

Self Cite

View full text Add to dashboard Cite

Abstract-Performance bugs are programming errors that create significant performance degradation. While developers often use automated oracles for detecting functional bugs, detecting performance bugs usually requires time-consuming, manual analysis of execution profiles. The human effort for performance analysis limits the number of performance tests analyzed and enables performance bugs to easily escape to production. Unfortunately, while profilers can successfully localize slow executing code, profilers cannot be effectively used as automated oracles. This paper presents TODDLER, a novel automated oracle for performance bugs, which enables testing for performance bugs to use the well established and automated process of testing for functional bugs. TODDLER reports code loops whose computation has repetitive and partially similar memory-access patterns across loop iterations. Such repetitive work is likely unnecessary and can be done faster. We implement TODDLER for Java and evaluate it on 9 popular Java codebases. Our experiments with 11 previously known, real-world performance bugs show that TODDLER finds these bugs with a higher accuracy than the standard Java profiler. Using TODDLER, we also found 42 new bugs in six Java projects: Ant, Google Core Libraries, JUnit, Apache Collections, JDK, and JFreeChart. Based on our bug reports, developers so far fixed 10 bugs and confirmed 6 more as real bugs.

show abstract

Section: Category 4 (Inefficient Inner Loops)mentioning

confidence: 99%

“…We focus our efforts on collection classes because they are widely used and make both automated generation [18] and manual writing of tests easier than domain-specific applications such as Groovy or Lucene. Ant, Apache Collections, and Google Core Libraries (GCL) implement collection classes.…”

Section: B Experiments With New Bugs and Performance Testsmentioning

confidence: 99%

Toddler: Detecting performance problems via similar memory-access patterns

Nistor¹,

Song²,

Marinov³

et al. 2013

2013 35th International Conference on Software Engineering (ICSE)

Self Cite

View full text Add to dashboard Cite

show abstract

“…This is of interest as container classes represent a particular type of classes that avoids many problems such as environment interaction, and recent studies have shown that even "simple" random testing can achieve high coverage on such classes [37]. Interestingly, 17 papers exclusively focus on container classes, and many other papers include container classes.…”

Section: Software Engineering Experimentationmentioning

confidence: 99%

Sound empirical evidence in software testing

Fraser

Arcuri

2012

2012 34th International Conference on Software Engineering (ICSE)

Self Cite

View full text Add to dashboard Cite

Abstract-Several promising techniques have been proposed to automate different tasks in software testing, such as test data generation for object-oriented software. However, reported studies in the literature only show the feasibility of the proposed techniques, because the choice of the employed artifacts in the case studies (e.g., software applications) is usually done in a non-systematic way. The chosen case study might be biased, and so it might not be a valid representative of the addressed type of software (e.g., internet applications and embedded systems). The common trend seems to be to accept this fact and get over it by simply discussing it in a threats to validity section. In this paper, we evaluate search-based software testing (in particular the EVOSUITE tool) when applied to test data generation for open source projects. To achieve sound empirical results, we randomly selected 100 Java projects from SourceForge, which is the most popular open source repository (more than 300,000 projects with more than two million registered users). The resulting case study not only is very large (8784 public classes for a total of 291,639 bytecode level branches), but more importantly it is statistically sound and representative for open source projects. Results show that while high coverage on commonly used types of classes is achievable, in practice environmental dependencies prohibit such high coverage, which clearly points out essential future research directions. To support this future research, our SF100 case study can serve as a much needed benchmark for test generation.

show abstract

“…while (!done) { operation = choose(); input = choose(); assume(precondition); operation(input); assert(postcondition); } In fact, this generic harness is quite close to the actual structure of a large number of API-call based testing systems in explicit-state model checking and random testing, including the vast array of container-class testing frameworks [30]. Bounded exhaustive testing and other approaches also fit this framework.…”

Section: Test Generation Vs Test Programmingmentioning

confidence: 88%

Finding common ground: choose, assert, and assume

Groce

Erwig

2012

Proceedings of the Ninth International Workshop on Dynamic Analysis

View full text Add to dashboard Cite

At present, the "testing community" is on good speaking terms, but typically lacks a common language for expressing some computational ideas, even in cases where such a language would be both useful and plausible. In particular, a large body of testing systems define a testing problem in the language of the system under test, extended with operations for choosing inputs, asserting properties, and constraining the domain of executions considered. While the underlying algorithms used for "testing" include symbolic execution, explicit-state model checking, machine learning, and "old fashioned" random testing, there seems to be a common core of expressive need. We propose that the dynamic analysis community could benefit from working with some common syntactic (and to some extent semantic) mechanisms for expressing a body of testing problems. Such a shared language would have immediate practical uses and make cross-tool comparisons and research into identifying appropriate tools for different testing activities easier. We also suspect that considering the more abstract testing problem arising from this minimalist common ground could serve as a basis for thinking about the design of usable embedded domain-specific languages for testing and might help identify computational patterns that have escaped the notice of the community.

show abstract

Testing Container Classes: Random or Systematic?

Cited by 44 publications

References 33 publications

Toddler: Detecting performance problems via similar memory-access patterns

Toddler: Detecting performance problems via similar memory-access patterns

Sound empirical evidence in software testing

Finding common ground: choose, assert, and assume

Contact Info

Product

Resources

About