Today, most automated test generators, such as search-based software testing (SBST) techniques focus on achieving high code coverage. However, high code coverage is not sufficient to maximise the number of bugs found, especially when given a limited testing budget. In this paper, we propose an automated test generation technique that is also guided by the estimated degree of defectiveness of the source code. Parts of the code that are likely to be more defective receive more testing budget than the less defective parts. To measure the degree of defectiveness, we leverage Schwa, a notable defect prediction technique.We implement our approach into EvoSuite, a state of the art SBST tool for Java. Our experiments on the Defects4J benchmark demonstrate the improved efficiency of defect prediction guided test generation and confirm our hypothesis that spending more time budget on likely defective parts increases the number of bugs found in the same time budget. CCS CONCEPTS• Software and its engineering → Software testing and debugging; Search-based software engineering.
Context Machine learning (ML) software systems are permeating many aspects of our life, such as healthcare, transportation, banking, and recruitment. These systems are trained with data that is often biased, resulting in biased behaviour. To address this issue, fairness testing approaches have been proposed to test ML systems for fairness, which predominantly focus on assessing classification-based ML systems. These methods are not applicable to regression-based systems, for example, they do not quantify the magnitude of the disparity in predicted outcomes, which we identify as important in the context of regression-based ML systems. Method: We conduct this study as design science research. We identify the problem instance in the context of emergency department (ED) wait-time prediction. In this paper, we develop an effective and efficient fairness testing approach to evaluate the fairness of regression-based ML systems. We propose fairness degree, which is a new fairness measure for regression-based ML systems, and a novel search-based fairness testing (SBFT) approach for testing regression-based machine learning systems. We apply the proposed solutions to ED wait-time prediction software. Results: We experimentally evaluate the effectiveness and efficiency of the proposed approach with ML systems trained on real observational data from the healthcare domain. We demonstrate that SBFT significantly outperforms existing fairness testing approaches, with up to 111% and 190% increase in effectiveness and efficiency of SBFT compared to the best performing existing approaches. Conclusion: These findings indicate that our novel fairness measure and the new approach for fairness testing of regression-based ML systems can identify the degree of fairness in predictions, which can help software teams to make data-informed decisions about whether such software systems are ready to deploy. The scientific knowledge gained from our work can be phrased as a technological rule; to measure the fairness of the regression-based ML systems in the context of emergency department wait-time prediction use fairness degree and search-based techniques to approximate it.
Automated test generators, such as search-based software testing (SBST) techniques are primarily guided by coverage information. As a result, they are very effective at achieving high code coverage. However, is high code coverage alone sufficient to detect bugs effectively? In this paper, we propose a new SBST technique, predictive many objective sorting algorithm (PreMOSA), which augments coverage information with defect prediction information to decide where to increase the test coverage in the class under test (CUT).Through an experimental evaluation using 420 labelled bugs on the Defects4J benchmark and using theoretical defect predictors, we demonstrate the improved effectiveness and efficiency of PreMOSA in detecting bugs when using any acceptable defect predictor, i.e., a defect predictor with recall and precision ≥ 75%, compared to the state-of-theart dynamic many objective sorting algorithm (DynaMOSA). PreMOSA detects up to 8.3% more labelled bugs on average than DynaMOSA when given a time budget of 2 minutes for test generation per CUT.
Previous studies have shown that Automated Program Repair (apr) techniques suffer from the overfitting problem. Overfitting happens when a patch is run and the test suite does not reveal any error, but the patch actually does not fix the underlying bug or it introduces a new defect that is not covered by the test suite. Therefore, the patches generated by apr tools need to be validated by human programmers, which can be very costly, and prevents apr tools adoption in practice. Our work aims at increasing developer trust in automated patch generation by minimizing the number of plausible patches that they have to review, thereby reducing the time required to find a correct patch. We introduce a novel light-weight test-based patch clustering approach called xTestCluster, which clusters patches based on their dynamic behavior. xTestCluster is applied after the patch generation phase in order to analyze the generated patches from one or more repair tools and to provide
Introduction Conventional sleep macro-architecture measures of sleep disruption have not been associated with the adverse neurocognitive sequelae of sleep disordered breathing (SDB). Sleep spindles protecting the sleeping brain from external sensory stimuli and can serve as markers of sleep integrity. We investigated the relationship between sleep spindles and sleep fragmentation and neurocognition across the spectrum of SDB in children. Methods Children 3-12y referred for clinical assessment of SDB and age matched control children from the community were recruited. Sleep spindles were identified manually during N2 and N3 sleep. Spindle activity was characterised as spindle number, spindle density (number of spindles/ h) and spindle intensity (spindle density x average spindle duration). The Stanford-Binet Intelligence Scales measured global intellectual ability and the NeuroPSYchological assessment (NEPSY-II) measured language, attention, visuospatial ability and sensorimotor skills. Results Children were grouped into control, Primary Snoring, Mild OSA and Moderate/severe OSA, N=10/ group. All measures of spindle activity were lower in the SDB groups compared to the Control children and this reached statistical significance for Mild OSA (p<0.05 for all). Spindle activity was not correlated with any measure of the Stanford-Binet. Overall, all measures of spindle activity were positively correlated with the Design Copy and the Inhibition Naming combined scale score and negatively correlated with Auditory Attention subscales of the NEPSY-II. Conclusion The reduced spindle activity observed in the children with SDB, particularly Mild OSA, indicates that sleep micro-architecture is disrupted and that this disruption may underpin the negative effects of SDB on attention, learning and memory.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.