2014
DOI: 10.1109/tse.2014.2340398
|View full text |Cite
|
Sign up to set email alerts
|

Predicting Vulnerable Software Components via Text Mining

Abstract: This paper presents an approach based on machine learning to predict which components of a software application contain security vulnerabilities. The approach is based on text mining the source code of the components. Namely, each component is characterized as a series of terms contained in its source code, with the associated frequencies. These features are used to forecast whether each component is likely to contain vulnerabilities. In an exploratory validation with 20 Android applications, we discovered tha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

4
191
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 281 publications
(195 citation statements)
references
References 34 publications
4
191
0
Order By: Relevance
“…The results show that the approach had good precision and recall when used for prediction within a single project. Walden et al [28] confirmed that the vulnerability prediction technique based on text mining (described in [21]) could be more accurate than models based on software metrics. They have collected a dataset of PHP vulnerabilities for three open source web applications by mining the NVD and security announcements of those applications.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…The results show that the approach had good precision and recall when used for prediction within a single project. Walden et al [28] confirmed that the vulnerability prediction technique based on text mining (described in [21]) could be more accurate than models based on software metrics. They have collected a dataset of PHP vulnerabilities for three open source web applications by mining the NVD and security announcements of those applications.…”
Section: Related Workmentioning
confidence: 99%
“…Massacci and Nguyen [14] provide a comprehensive survey and independent empirical validation of several vulnerability discovery models. Several other metrics have been used: code complexity metrics [25,24,16], developer activity metrics [24], static analysis defect densities [27], frequencies of occurrence of programming constructs [21,28], etc. We illustrate some representative cases in Table 2.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Although it is a relatively new area of research, a great number of VPMs has already been proposed in the related literature. As stated in [9], the main VPMs that can be found in the literature utilize software metrics [13][14][15][16][17][18][19][20][21][22], text mining [23][24][25][26][27][28], and security-related static analysis alerts [10,[29][30][31][32]] to predict vulnerabilities. These types of VPMs are analyzed in the rest of this section.…”
Section: Vulnerability Prediction Modelingmentioning
confidence: 99%
“…An empirical evaluation on 19 versions of a large-scale Android application, revealed that their technique may be promising for vulnerability prediction, as the produced predictors achieved sufficient precision (85% on average) and recall (87% on average). Based on these preliminary results, the same authors conducted a more elaborate empirical study to investigate the validity of their approach [25]. In particular, several VPMs using Naïve Bayes and Random Forest algorithms were constructed and evaluated on a code base of 20 large-scale Android applications.…”
Section: Vulnerability Prediction Modelingmentioning
confidence: 99%