The analysis and grading of software vulnerabilities is an important process that is done manually by experts today. For this reason, there are time delays, human errors, and excessive costs involved with the process. The final result of these software vulnerability reports created by experts is the calculation of a severity score and a severity rating. The severity rating is the first and foremost value of the software’s vulnerability. The vulnerabilities that can be exploited are only 20% of the total vulnerabilities. The vast majority of exploitations take place within the first two weeks. It is therefore imperative to determine the severity rating without time delays. Our proposed model uses statistical methods and deep learning-based word embedding methods from natural language processing techniques, and machine learning algorithms that perform multi-class classification. Bag of Words, Term Frequency Inverse Document Frequency and Ngram methods, which are statistical methods, were used for feature extraction. Word2Vec, Doc2Vec and Fasttext algorithms are included in the study for deep learning based Word embedding. In the classification stage, Naive Bayes, Decision Tree, K-Nearest Neighbors, Multi-Layer Perceptron, and Random Forest algorithms that can make multi-class classification were preferred. With this aspect, our model proposes a hybrid method. The database used is open to the public and is the most reliable data set in the field. The results obtained in our study are quite promising. By helping experts in this field, procedures will speed up. In addition, our study is one of the first studies containing the latest version of the data size and scoring systems it covers.
In order to protect information systems against threats and vulnerabilities, security breaches should be analyzed. In this case, analysts primarily conduct intelligence research through open source systems. In particular, vulnerability databases stand out as the most preferred references at this stage. At this point, our study will be the main reference for the verification of vulnerability analysis. It will assist in the planning of testing processes, patches and updates in the development of software. Moreover, it will create a perspective in this field, enabling readers to understand the concept of software security and databases. In addition to unique advantages of this diversity, this has also led to some disadvantages. Our study focused on the reasons behind the creation of different databases. In addition, its advantages and disadvantages have been clearly demonstrated. First, the databases used were determined by examining the academic studies in the field of software security vulnerabilities. Twelve different databases used in the literature were identified. However, among these, the ones that are current and accessible to researchers were selected. As a result of this screening process, seven different databases were included in this study. The determined databases were examined in detail and explained. Then, databases were compared according to certain criteria. The data obtained as a result of the comparison are presented in detail. In this study, a systematic review of up-to-date and accessible vulnerability databases that are widely used in the literature is presented to help researchers decide which database to use.
Detection and analysis of software vulnerabilities is a very important consideration. For this reason, software security vulnerabilities that have been identified for many years are listed and tried to be classified. Today, this process, performed manually by experts, takes time and is costly. Many methods have been proposed for the reporting and classification of software security vulnerabilities. Today, for this purpose, the Common Vulnerability Scoring System is officially used. The scoring system is constantly updated to cover the different security vulnerabilities included in the system, along with the changing security perception and newly developed technologies. Different versions of the scoring system are used with vulnerability reports. In order to add new versions of the published scoring system to the old vulnerability reports, all analyzes must be done manually backwards in accordance with the new security framework. This is a situation that requires a lot of resources, time and expert skill. For this reason, there are large deficiencies in the values of vulnerability scoring systems in the database. The aim of this study is to estimate missing security metrics of vulnerability reports using natural language processing and machine learning algorithms. For this purpose, a model using term frequency inverse document frequency and K-Nearest Neighbors algorithms is proposed. In addition, the obtained data was presented to the use of researchers as a new database. The results obtained are quite promising. A publicly available database was chosen as the data set that all researchers accepted as a reference. This approach facilitates the evaluation and analysis of our model. This study was performed with the largest dataset size available from this database to the best of our knowledge and is one of the limited studies on the latest version of the official scoring system published for classification of software security vulnerabilities. Due to the mentioned issues, our study is a comprehensive and original study in the field.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.