Benchmarks provide an experimental basis for evaluating software engineering processes or techniques in an objective and repeatable manner. We present the FAULTBENCH benchmark, as a contribution to current benchmark materials, for evaluation and comparison of techniques that prioritize and classify alerts generated by static analysis tools. Alert prioritization and classification addresses the problem in many static analysis tools of numerous alerts that are not an indication of a fault or unimportant to the developer. We utilized FAULTBENCH to evaluate three versions of the AWARE adaptive ranking model to prioritize and classify static analysis alerts. Individual FAULTBENCH subjects have different best prioritization and classification techniques. Using a single subject to evaluate a prioritization and classification technique could provide incorrect results. Together, FAULTBENCH subjects provide a precise and general evaluation of alert prioritization and classification techniques.
a b s t r a c tContext: Automated static analysis (ASA) identifies potential source code anomalies early in the software development lifecycle that could lead to field failures. Excessive alert generation and a large proportion of unimportant or incorrect alerts (unactionable alerts) may cause developers to reject the use of ASA. Techniques that identify anomalies important enough for developers to fix (actionable alerts) may increase the usefulness of ASA in practice. Objective: The goal of this work is to synthesize available research results to inform evidence-based selection of actionable alert identification techniques (AAIT).Method: Relevant studies about AAITs were gathered via a systematic literature review. Results: We selected 21 peer-reviewed studies of AAITs. The techniques use alert type selection; contextual information; data fusion; graph theory; machine learning; mathematical and statistical models; or dynamic detection to classify and prioritize actionable alerts. All of the AAITs are evaluated via an example with a variety of evaluation metrics. Conclusion:The selected studies support (with varying strength), the premise that the effective use of ASA is improved by supplementing ASA with an AAIT. Seven of the 21 selected studies reported the precision of the proposed AAITs. The two studies with the highest precision built models using the subject program's history. Precision measures how well a technique identifies true actionable alerts out of all predicted actionable alerts. Precision does not measure the number of actionable alerts missed by an AAIT or how well an AAIT identifies unactionable alerts. Inconsistent use of evaluation metrics, subject programs, and ASAs in the selected studies preclude meta-analysis and prevent the current results from informing evidence-based selection of an AAIT. We propose building on an actionable alert identification benchmark for comparison and evaluation of AAIT from literature on a standard set of subjects and utilizing a common set of evaluation metrics.
The computer science education (CSEd) research community consists of a large group of passionate CS educators who often contribute to other disciplines of CS research. There has been a trend in other disciplines toward more rigorous and empirical evaluation of various hypotheses. Prior investigations of the thencurrent state of CSEd research showed a distinct lack of rigor in the top research publication venues, with most papers falling in the general category of experience reports. In this paper, we present our examination of the two most recent proceedings of the SIGCSE Technical Symposium, providing a snapshot of the current state of empiricism at the largest CSEd venue. Our goal to categorize the current state of empiricism in the SIGCSE Technical Symposium and identify where the community might benefit from increased empiricism when conducting CSEd research. We found an increase in empirical validation of CSEd research to over 70%; however, our findings suggest that current CSEd research minimizes replication precluding meta-analysis and theory building.
Static analysis tools are useful for finding common programming mistakes that often lead to field failures. However, static analysis tools regularly generate a high number of false positive alerts, requiring manual inspection by the developer to determine if an alert is an indication of a fault. The adaptive ranking model presented in this paper utilizes feedback from developers about inspected alerts in order to rank the remaining alerts by the likelihood that an alert is an indication of a fault. Alerts are ranked based on the homogeneity of populations of generated alerts, historical developer feedback in the form of suppressing false positives and fixing true positive alerts, and historical, application-specific data about the alert ranking factors. The ordering of alerts generated by the adaptive ranking model is compared to a baseline of randomly-, optimally-, and static analysis tool-ordered alerts in a small role-based health care application. The adaptive ranking model provides developers with 81% of true positive alerts after investigating only 20% of the alerts whereas an average of 50 random orderings of the same alerts found only 22% of true positive alerts after investigating 20% of the generated alerts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.