2020
DOI: 10.3390/app10041270
|View full text |Cite
|
Sign up to set email alerts
|

Collecting Vulnerable Source Code from Open-Source Repositories for Dataset Generation

Abstract: Different Machine Learning techniques to detect software vulnerabilities have emerged in scientific and industrial scenarios. Different actors in these scenarios aim to develop algorithms for predicting security threats without requiring human intervention. However, these algorithms require data-driven engines based on the processing of huge amounts of data, known as datasets. This paper introduces the SonarCloud Vulnerable Code Prospector for C (SVCP4C). This tool aims to collect vulnerable source code from o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
3
0
2

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 40 publications
0
3
0
2
Order By: Relevance
“…Raducu et al [14] emphasize that different machine learning techniques have emerged and developed to detect security vulnerabilities. However, they point out that the performance of these algorithms require data driven engines that rely on processing large amounts of data known as data sets.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Raducu et al [14] emphasize that different machine learning techniques have emerged and developed to detect security vulnerabilities. However, they point out that the performance of these algorithms require data driven engines that rely on processing large amounts of data known as data sets.…”
Section: Related Workmentioning
confidence: 99%
“…However, as can be understood from the literature reviews, it is clear that in order to achieve high performance, it is necessary to use structured and extracted data sets in studies. [7,10,11,14,15]. The NVD data set, which dominates the field, cannot ensure this due to its structure.…”
Section: Related Workmentioning
confidence: 99%
“…SVCP4C (SonarCloud Vulnerable Code Prospector for C) is an online tool for collecting vulnerable source code from opensource repositories linked to SonarCloud. The tool performs static analysis and labels the potentially vulnerable source code at the file level [20]. The Devign [21] dataset includes four real-world open-source C/C++ projects, Linux, FFmpeg, Qemu and Wireshark, where the labeling is performed using security-related keyword filtering.…”
Section: Related Workmentioning
confidence: 99%
“…Raducu vd. (Raducu et al 2020), güvenlik açıklarını tespit etmek için farklı makine öğrenimi tekniklerinin ortaya çıktığını ve geliştirildiğini vurgulanmaktadır. Ancak, bu algoritmaların performanslarının veri kümeleri olarak bilinen çok miktarda verinin işlenmesine dayanan veri güdümlü motorlara ihtiyaç duyduğunu belirtmektedir.…”
Section: Introductionunclassified
“…Son yıllarda yapılan çalışmalar makine öğrenmesi temelli yaklaşımların kullanımı tavsiye etmektedir. Ancak literatür incelemelerinden de anlaşılacağı üzere yüksek başarım elde edilebilmesi için çalışmalarda yapılandırılmış ve özellikleri çıkarılmış veri setlerinin kullanılmasına ihtiyaç olduğu açıktır (Ghaffarian and Shahriari 2017;Miyamoto, Yamamoto, and Nakayama 2017;Raducu et al 2020;Theisen and Williams 2020;Wu et al 2020). Alana yön veren NVD veri seti yapısı itibari ile bunu sağlayamamaktadır.…”
Section: Introductionunclassified