Context:-With numerous CVE reports filed year after year, C (and C++) are still among the prog. languages most susceptible to introducing vulnerabilities-Static analyzers are widely used by developers as they are oftentimes less expensive to set up and execute than dynamic testing tools (e.g., fuzzers or concolic execution engines)
Fuzzing is a widely used automated testing technique that uses random inputs to provoke program crashes indicating security breaches. A difficult but important question is when to stop a fuzzing campaign. Usually, a campaign is terminated when the number of crashes and/or covered code elements has not increased over a certain period of time. To avoid premature termination when a ramp-up time is needed before vulnerabilities are reached, code coverage is often preferred over crash count to decide when to terminate a campaign. However, a campaign might only increase the coverage on non-security-critical code or repeatedly trigger the same crashes. For these reasons, both code coverage and crash count tend to overestimate the fuzzing effectiveness, unnecessarily increasing the duration and thus the cost of the testing process.The present paper explores the tradeoff between the amount of saved fuzzing time and number of missed bugs when stopping campaigns based on the saturation of covered, potentially vulnerable functions rather than triggered crashes or regular function coverage. In a large-scale empirical evaluation of 30 open-source C programs with a total of 240 security bugs and 1,280 fuzzing campaigns, we first show that binary classification models trained on software with known vulnerabilities (CVEs), using lightweight machine learning features derived from findings of static application security testing tools and proven software metrics, can reliably predict (potentially) vulnerable functions. Second, we show that our proposed stopping criterion terminates 24-hour fuzzing campaigns 6-12 hours earlier than the saturation of crashes and regular function coverage while missing (on average) fewer than 0.5 out of 12.5 contained bugs. CCS CONCEPTS• Security and privacy → Software security engineering.
No abstract
Continuous integration (CI) pipelines are commonly used to execute regression tests before pull requests are merged. Regression test selection (RTS) aims to reduce the required testing effort and feedback time for developers. However, existing RTS techniques are imprecise for tests with cross-language links to compiled C++ binaries or unsafe if tests use external files. This is problematic because modern software in fact involves several programming languages and (non-)code artifacts such as configuration files. In this paper, we present BINARYRTS, a novel RTS technique that leverages dynamic binary instrumentation to collect the covered functions and accessed external files for each test. BINARYRTS then selects tests depending on changes issued to C++ binaries or external (non-)code artifacts. When evaluating BINARYRTS in our large-scale industrial context, we are able to exclude on average up to 74% of tests without missing real failures. We release BINARYRTS as the first publicly available RTS tool for software involving C++ code.
Reachable coverage is the number of code elements in the search space of a fuzzer (i.e., an automatic software testing tool). A fuzzer cannot find bugs in code that is unreachable. Hence, reachable coverage quantifies fuzzer effectiveness. Using static program analysis, we can compute an upper bound on the number of reachable coverage elements, e.g., by extracting the call graph. However, we cannot decide whether a coverage element is reachable in general. If we could precisely determine reachable coverage efficiently, we would have solved the software verification problem. Unfortunately, we cannot approach a given degree of accuracy for the static approximation, either.In this paper, we advocate a statistical perspective on the approximation of the number of elements in the fuzzer's search space, where accuracy does improve as a function of the analysis runtime. In applied statistics, corresponding estimators have been developed and well established for more than a quarter century. These estimators hold an exciting promise to finally tackle the long-standing challenge of counting reachability. In this paper, we explore the utility of these estimators in the context of fuzzing. Estimates of reachable coverage can be used to measure (a) the amount of untested code, (b) the effectiveness of the testing technique, and (c) the completeness of the ongoing fuzzing campaign (w.r.t. the asymptotic max. achievable coverage). We make all data and our analysis publicly available.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.