2018
DOI: 10.1007/978-3-319-99133-7_22
|View full text |Cite
|
Sign up to set email alerts
|

Toward Validation of Textual Information Retrieval Techniques for Software Weaknesses

Abstract: This paper presents a preliminary validation of common textual information retrieval techniques for mapping unstructured software vulnerability information to distinct software weaknesses. The validation is carried out with a dataset compiled from four software repositories tracked in the Snyk vulnerability database. According to the results, the information retrieval techniques used perform unsatisfactorily compared to regular expression searches. Although the results vary from a repository to another, the pr… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
6
1

Relationship

3
4

Authors

Journals

citations
Cited by 13 publications
(9 citation statements)
references
References 31 publications
0
9
0
Order By: Relevance
“…The following cut JSON (JavaScript Object Notation) excerpt can be used to illustrate a rather typical entry in Safety DB: As can be seen, the advisory field provides a brief textual description for each vulnerability archived to the database. These descriptions follow the typically terse prose used for describing vulnerabilities [12]. (It is also worth remarking that the textual advisories in Safety DB are mostly plagiarized directly from NVD and related sources.)…”
Section: A Sourcesmentioning
confidence: 99%
See 1 more Smart Citation
“…The following cut JSON (JavaScript Object Notation) excerpt can be used to illustrate a rather typical entry in Safety DB: As can be seen, the advisory field provides a brief textual description for each vulnerability archived to the database. These descriptions follow the typically terse prose used for describing vulnerabilities [12]. (It is also worth remarking that the textual advisories in Safety DB are mostly plagiarized directly from NVD and related sources.)…”
Section: A Sourcesmentioning
confidence: 99%
“…These limitations have prompted a new branch of research for examining vulnerabilities in software repositories. While packages used in Linux distributions have been a common target [10], the more recent research has focused on languagespecific repositories such as npm for JavaScript [11], [12]. This is the research domain to which this paper contributes by presenting the supposedly first study on vulnerabilities in the Python's PyPI repository and advancing the understanding on release-based time series analysis of software vulnerabilities.…”
Section: Introductionmentioning
confidence: 99%
“…In addition to the stopwords supplied in the library, the twelve most frequent tokens were used as custom excluded stopwords: data, article, personal, protection, processing, company, authority, regulation, information, case, art, and page. After this pre-processing, the token-based term frequency (TF) and term frequency inverse document frequency (TF-IDF) were calculated from the whole corpus constructed (for the exact formulas used see, e.g., [19]). These common information retrieval statistics are used for evaluating the other part in Q 2 .…”
Section: Methodsmentioning
confidence: 99%
“…Another study defined a framework to prioritize vulnerabilities [19]. Several studies have focused on mining methods and information retrieval for a security knowledge repository [20], [21], [22], [23], [24]. These papers mined each repository using their relationships.…”
Section: Related Workmentioning
confidence: 99%