An Exploratory Study on Machine Learning to Combine Security Vulnerability Alerts from Static Analysis Tools

Pereira, Jose D'Abruzzo; Campos, Joao R.; Vieira, Marco

doi:10.1109/ladc48089.2019.8995685

Cited by 10 publications

(5 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Dataset preparation: Authors used existing labeled datasets as well as created their own datasets to train ml models. Specifically, a set of studies [48,156,219,243,254,263,298] used available labeled datasets for php, Java, C, C++, and Android applications to train vulnerability detection models. In other cases, Russell et al [261] extended an existing dataset with millions of C and C++ functions and then labeled it based on the output of three static analyzers (i.e., Clang, CppCheck, and Flawfinder).…”

Section: Vulnerability Analysismentioning

confidence: 99%

“…Repository and file metrics: Perl et al [244] collected GitHub repository meta-data (i.e., programming language, star count, fork count, and number of commits) in addition to source code metrics. Other authors [95,243] used file meta-data such as files' creation and modification time, machine type, file size, and linker version.…”

Section: Vulnerability Analysismentioning

confidence: 99%

“…Traditional ML techniques: One set of studies [10,220,229,238,243,244,261,279] used traditional ml algorithms such as Support Vector Machine, Linear Regression, Decision Tree, and Random Forest to train their models. Specifically, Ali Alatwi et al [10], Perl et al [244], Russell et al [261] selected Support Vector Machine because it is not affected by over-fitting when having very high dimensional variable spaces.…”

Section: Vulnerability Analysismentioning

confidence: 99%

“…Along the similar lines, Ndichu et al [229] used Support Vector Machine to train their model with linear kernel. Pereira et al [243] used Decision Tree, Linear Regression, and Lasso to train their models. Compared to the above studies, Shar et al [279] used both supervised (i.e., Linear Regression and Random Forest) and semi-supervised (i.e., Co-trained Random Forest) algorithms to train their models since most of that datasets were not labeled.…”

Section: Vulnerability Analysismentioning

confidence: 99%

See 3 more Smart Citations

A Survey on Machine Learning Techniques for Source Code Analysis

Sharma¹,

Kechagia²,

Georgiou³

et al. 2021

Preprint

View full text Add to dashboard Cite

Context:The advancements in machine learning techniques have encouraged researchers to apply these techniques to a myriad of software engineering tasks that use source code analysis such as testing and vulnerabilities detection. A large number of studies poses challenges to the community to understand the current landscape. Objective: We aim to summarize the current knowledge in the area of applied machine learning for source code analysis. Method: We investigate studies belonging to twelve categories of software engineering tasks and corresponding machine learning techniques, tools, and datasets that have been applied to solve them. To do so, we carried out an extensive literature search and identified 364 primary studies published between 2002 and 2021. We summarize our observations and findings with the help of the identified studies. Results: Our findings suggest that the usage of machine learning techniques for source code analysis tasks is consistently increasing. We synthesize commonly used steps and the overall workflow for each task, and summarize the employed machine learning techniques. Additionally, we collate a comprehensive list of available datasets and tools useable in this context. Finally, we summarize the perceived challenges in this area that include availability of standard datasets, reproducibility and replicability, and hardware resources. CCS Concepts: • Software and its engineering → Software libraries and repositories; Software maintenance tools; Software post-development issues; Maintaining software; • Computing methodologies → Machine learning.

show abstract

Section: Vulnerability Analysismentioning

confidence: 99%

Section: Vulnerability Analysismentioning

confidence: 99%

Section: Vulnerability Analysismentioning

confidence: 99%

Section: Vulnerability Analysismentioning

confidence: 99%

See 2 more Smart Citations

A Survey on Machine Learning Techniques for Source Code Analysis

Sharma¹,

Kechagia²,

Georgiou³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In this section, we review the main and recent studies about the combination of different types of web applications security analysis tools with the main objectives of discovering more vulnerabilities and reducing the number of false positives. Several works combine static analysis tools with machine learning techniques for automatic detection of security vulnerabilities in web applications reducing the number of false positives [52,53]. Other approximations are based in attacks and anomalies detection using machine learning techniques [54].…”

Section: Related Workmentioning

confidence: 99%

On Combining Static, Dynamic and Interactive Analysis Security Testing Tools to Improve OWASP Top Ten Security Vulnerability Detection in Web Applications

Tudela

Higuera

et al. 2020

Applied Sciences

View full text Add to dashboard Cite

The design of the techniques and algorithms used by the static, dynamic and interactive security testing tools differ. Therefore, each tool detects to a greater or lesser extent each type of vulnerability for which they are designed for. In addition, their different designs mean that they have different percentages of false positives. In order to take advantage of the possible synergies that different analysis tools types may have, this paper combines several static, dynamic and interactive analysis security testing tools—static white box security analysis (SAST), dynamic black box security analysis (DAST) and interactive white box security analysis (IAST), respectively. The aim is to investigate how to improve the effectiveness of security vulnerability detection while reducing the number of false positives. Specifically, two static, two dynamic and two interactive security analysis tools will be combined to study their behavior using a specific benchmark for OWASP Top Ten security vulnerabilities and taking into account various scenarios of different criticality in terms of the applications analyzed. Finally, this study analyzes and discuss the values of the selected metrics applied to the results for each n-tools combination.

show abstract

Effect of Coding Styles in Detection of Web Application Vulnerabilities

Medeiros

Neves

2020

2020 16th European Dependable Computing Conference (EDCC)

View full text Add to dashboard Cite

An Exploratory Study on Machine Learning to Combine Security Vulnerability Alerts from Static Analysis Tools

Cited by 10 publications

References 23 publications

A Survey on Machine Learning Techniques for Source Code Analysis

A Survey on Machine Learning Techniques for Source Code Analysis

On Combining Static, Dynamic and Interactive Analysis Security Testing Tools to Improve OWASP Top Ten Security Vulnerability Detection in Web Applications

Effect of Coding Styles in Detection of Web Application Vulnerabilities

Contact Info

Product

Resources

About