Learn&amp;Fuzz: Machine learning for input fuzzing

Godefroid, Patrice; Peleg, Hila; Singh, Rishabh

doi:10.1109/ase.2017.8115618

Cited by 287 publications

(238 citation statements)

References 25 publications

Supporting

Mentioning

236

Contrasting

Unclassified

Order By: Relevance

“…Table I shows that the new seed corpora generated by our framework caused up to 2.48% more basic blocks and 24.30% more execution paths being covered than the original seed corpus. Our results significantly surpassed similar works such as [1], which generated seed corpora by learning the grammar of the PDF files and the new corpora covered 0.11% more instructions. We next evaluated our framework by fuzzing MuPDF and three other PDF viewers (pdfium, podofo, and poppler) with the original and generated corpus for 24 hours.…”

Section: Evaluationscontrasting

confidence: 51%

“…Most existing fuzzing tools, or fuzzers, generate excessive test inputs by mutating a pre-selected corpus of seed inputs with the hope to reveal potential bugs in the target program. Therefore, extensive research effort has been dedicated to improving the quality of seed corpora [1]. Existing approaches in this direction, however, share a common limitation that they focus on discovering syntactic or semantic constraints posed by the target program for inputs in order to generate valid seed inputs.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Optimizing Seed Inputs in Fuzzing with Machine Learning

Cheng

Zhang

et al. 2019

2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)

View full text Add to dashboard Cite

The success of a fuzzing campaign is heavily depending on the quality of seed inputs used for test generation. It is however challenging to compose a corpus of seed inputs that enable high code and behavior coverage of the target program, especially when the target program requires complex input formats such as PDF files. We present a machine learning based framework to improve the quality of seed inputs for fuzzing programs that take PDF files as input. Given an initial set of seed PDF files, our framework utilizes a set of neural networks to 1) discover the correlation between these PDF files and the execution in the target program, and 2) leverage such correlation to generate new seed files that more likely explore new paths in the target program. Our experiments on a set of widely used PDF viewers demonstrate that the improved seed inputs produced by our framework could significantly increase the code coverage of the target program and the likelihood of detecting program crashes.

show abstract

Section: Evaluationscontrasting

confidence: 51%

Section: Introductionmentioning

confidence: 99%

Optimizing Seed Inputs in Fuzzing with Machine Learning

Cheng

Zhang

et al. 2019

2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)

View full text Add to dashboard Cite

show abstract

“…Therefore, Superion may have trouble finding proprietary grammars or undocumented extensions to standard grammars. However, several automatic grammar inference techniques [7,29,34,63] have been proposed, we plan to integrate such techniques to have a wider applicability.…”

Section: H Discussionmentioning

confidence: 99%

Superion: Grammar-Aware Greybox Fuzzing

Wang

Chen

Wei

et al. 2019

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

185

View full text Add to dashboard Cite

In recent years, coverage-based greybox fuzzing has proven itself to be one of the most effective techniques for finding security bugs in practice. Particularly, American Fuzzy Lop (AFL for short) is deemed to be a great success in fuzzing relatively simple test inputs. Unfortunately, when it meets structured test inputs such as XML and JavaScript, those grammar-blind trimming and mutation strategies in AFL hinder the effectiveness and efficiency.To this end, we propose a grammar-aware coverage-based greybox fuzzing approach to fuzz programs that process structured inputs. Given the grammar (which is often publicly available) of test inputs, we introduce a grammar-aware trimming strategy to trim test inputs at the tree level using the abstract syntax trees (ASTs) of parsed test inputs. Further, we introduce two grammar-aware mutation strategies (i.e., enhanced dictionary-based mutation and tree-based mutation). Specifically, tree-based mutation works via replacing subtrees using the ASTs of parsed test inputs. Equipped with grammar-awareness, our approach can carry the fuzzing exploration into width and depth.We implemented our approach as an extension to AFL, named Superion; and evaluated the effectiveness of Superion on real-life large-scale programs (a XML engine libplist and three JavaScript engines WebKit, Jerryscript and ChakraCore). Our results have demonstrated that Superion can improve the code coverage (i.e., 16.7% and 8.8% in line and function coverage) and bug-finding capability (i.e., 31 new bugs, among which we discovered 21 new vulnerabilities with 16 CVEs assigned and 3.2K USD bug bounty rewards received) over AFL and jsfunfuzz. We also demonstrated the effectiveness of our grammar-aware trimming and mutation.

show abstract

“…There has also been some recent interest in automatically generating input grammars from existing inputs, using machine learning [41] and language inference algorithms [22]. Similarly, DI-FUZE [33] infers device driver interfaces from a running kernel to boostrap subsequent structured fuzzing.…”

Section: Related Workmentioning

confidence: 99%

Semantic fuzzing with zest

Padhye

Lemieux

Sen

et al. 2019

Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis

107

View full text Add to dashboard Cite

Programs expecting structured inputs often consist of both a syntactic analysis stage, which parses raw input, and a semantic analysis stage, which conducts checks on the parsed input and executes the core logic of the program. Generator-based testing tools in the lineage of QuickCheck are a promising way to generate random syntactically valid test inputs for these programs. We present Zest, a technique which automatically guides QuickCheck-like randominput generators to better explore the semantic analysis stage of test programs. Zest converts random-input generators into deterministic parametric generators. We present the key insight that mutations in the untyped parameter domain map to structural mutations in the input domain. Zest leverages program feedback in the form of code coverage and input validity to perform feedback-directed parameter search. We evaluate Zest against AFL and QuickCheck on five Java programs: Maven, Ant, BCEL, Closure, and Rhino. Zest covers 1.03×-2.81× as many branches within the benchmarks' semantic analysis stages as baseline techniques. Further, we find 10 new bugs in the semantic analysis stages of these benchmarks. Zest is the most effective technique in finding these bugs reliably and quickly, requiring at most 10 minutes on average to find each bug. CCS CONCEPTS• Software and its engineering → Software testing and debugging. KEYWORDSStructure-aware fuzzing, property-based testing, random testing ACM Reference Format:

show abstract

Learn&Fuzz: Machine learning for input fuzzing

Cited by 287 publications

References 25 publications

Optimizing Seed Inputs in Fuzzing with Machine Learning

Optimizing Seed Inputs in Fuzzing with Machine Learning

Superion: Grammar-Aware Greybox Fuzzing

Semantic fuzzing with zest

Contact Info

Product

Resources

About