Automated Software Vulnerability Assessment with Concept Drift

Le, Triet Huynh Minh; Sabir, Bushra; Babar, Muhammad Ali

doi:10.1109/msr.2019.00063

Cited by 29 publications

(21 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One such task that has been regularly investigated is severity prediction [5]. selected stage for prediction, e.g., for bug reports [9] or SV databases [18]. Through this RQ, we display the impacts on prediction performance that dataset selection can have, and hence motivate researchers to properly consider this issue.…”

Section: A Research Questionsmentioning

confidence: 99%

“…Older users may be better at assessing SVs. data as input [9], [18], to predict the normalized severity categories described in Table IV. Following these practices, we preprocessed text descriptions through removal of stop words (using the NLTK and sklearn stopword list) and punctuation, conversion to lowercase, and stemming.…”

Section: Reporter Profile Agementioning

confidence: 99%

“…Following these practices, we preprocessed text descriptions through removal of stop words (using the NLTK and sklearn stopword list) and punctuation, conversion to lowercase, and stemming. The descriptions were then encoded using a bag-of-words model; we only extracted features for words that appeared in more than 0.1% of all descriptions [18]. We evaluated the same classifiers and tuned the same hyperparameters that were experimented with by prior works [18].…”

Section: Reporter Profile Agementioning

confidence: 99%

“…The descriptions were then encoded using a bag-of-words model; we only extracted features for words that appeared in more than 0.1% of all descriptions [18]. We evaluated the same classifiers and tuned the same hyperparameters that were experimented with by prior works [18]. Full details of the experimental setup are available in our reproduction package [10].…”

Section: Reporter Profile Agementioning

confidence: 99%

“…Full details of the experimental setup are available in our reproduction package [10]. Time-based validation methods have been shown to be important for this line of research [18]. Hence, we sorted our dataset by submission date of the report description, and then divided the dataset into a training, validation and test set through an 80:10:10 split.…”

Section: Reporter Profile Agementioning

confidence: 99%

See 4 more Smart Citations

An Investigation into Inconsistency of Software Vulnerability Severity across Data Sources

Croft¹,

Babar²,

Li³

2021

Preprint

View full text Add to dashboard Cite

Software Vulnerability (SV) severity assessment is a vital task for informing SV remediation and triage. Ranking of SV severity scores is often used to advise prioritization of patching efforts. However, severity assessment is a difficult and subjective manual task that relies on expertise, knowledge, and standardized reporting schemes. Consequently, different data sources that perform independent analysis may provide conflicting severity rankings. Inconsistency across these data sources affects the reliability of severity assessment data, and can consequently impact SV prioritization and fixing. In this study, we investigate severity ranking inconsistencies over the SV reporting lifecycle. Our analysis helps characterize the nature of this problem, identify correlated factors, and determine the impacts of inconsistency on downstream tasks. Our findings observe that SV severity often lacks consideration or is underestimated during initial reporting, and such SVs consequently receive lower prioritization. We identify six potential attributes that are correlated to this misjudgment, and show that inconsistency in severity reporting schemes can severely degrade the performance of downstream severity prediction by up to 77%. Our findings help raise awareness of SV severity data inconsistencies and draw attention to this data quality problem. These insights can help developers better consider SV severity data sources, and improve the reliability of consequent SV prioritization. Furthermore, we encourage researchers to provide more attention to SV severity data selection.

show abstract

Section: A Research Questionsmentioning

confidence: 99%

Section: Reporter Profile Agementioning

confidence: 99%

Section: Reporter Profile Agementioning

confidence: 99%

Section: Reporter Profile Agementioning

confidence: 99%

Section: Reporter Profile Agementioning

confidence: 99%

See 3 more Smart Citations

An Investigation into Inconsistency of Software Vulnerability Severity across Data Sources

Croft¹,

Babar²,

Li³

2021

Preprint

View full text Add to dashboard Cite

show abstract

Automatic software vulnerability classification by extracting vulnerability triggers

Sun

et al. 2022

J Software Evolu Process

View full text Add to dashboard Cite

Vulnerability classification is a significant activity in software development and software maintenance. Natural Language Processing (NLP) techniques, which utilize the descriptions in public repositories, are widely used in automatic software vulnerability classification. However, vulnerability descriptions are ordinarily short and contain many technical terms, making them difficult for machines to automatically comprehend. In this paper, we present an approach based on vulnerability triggers to automatically classify vulnerabilities. First, we extract vulnerability triggers with Bert Question and Answer (Bert Q&A). Then, we use Recurrent Convolutional Neural Networks for Text classification (TextRCNN) to classify vulnerabilities based on Common Weakness Enumeration (CWE). We statistically perform an analysis of vulnerability triggers and comprehensively evaluate the classification performance of our approach on a set of 4769 prelabeled vulnerability entries, as well as compare it with state‐of‐the‐art vulnerability classification approaches. Experiment results show that our approach can achieve a F1‐measure of 95% on extraction and 80.8% on classification.

show abstract