2018
DOI: 10.1155/2018/4947695
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Benchmark Generation Framework for Malware Detection

Abstract: To address emerging security threats, various malware detection methods have been proposed every year. Therefore, a small but representative set of malware samples are usually needed for detection model, especially for machine-learning-based malware detection models. However, current manual selection of representative samples from large unknown file collection is labor intensive and not scalable. In this paper, we firstly propose a framework that can automatically generate a small data set for malware detectio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 16 publications
0
7
0
Order By: Relevance
“…So far, malware classification methods mainly focus on feature engineering, which need to extract features from malware or visualization images, for example, API calls [6,7], system calls [8,9], and opcode sequences [10,11]. e malware classification framework in our work is closely related to opcode sequences and SVM.…”
Section: Malware Classificationmentioning
confidence: 99%
“…So far, malware classification methods mainly focus on feature engineering, which need to extract features from malware or visualization images, for example, API calls [6,7], system calls [8,9], and opcode sequences [10,11]. e malware classification framework in our work is closely related to opcode sequences and SVM.…”
Section: Malware Classificationmentioning
confidence: 99%
“…In our experiments, two groups of PE files are collected to form a dataset. The first group is the malware set containing 5,200 malicious files from three projects: VirusShare [42], DAS [43], and malwarebenchmark [44], where each file is labelled with 1. The second group is the benign set containing 5,150 benign files from the pure version of Windows XP (32bit/SP3), Windows 7 ultimate (64bit/SP1), Windows 8.1 (64bit) image and more than 30 software companies, where each file is labelled with 0.…”
Section: A Datasetmentioning
confidence: 99%
“…In addition to the sample analysis method, the sampling method used in the construction of small-scale test sample set is as important as the data source. The genetic algorithm-based roulette sampling algorithm adopted by Liang et al [3]. has a high time complexity problem, and the sandbox-based behavioral characteristics detection scheme would result in inaccurate analysis because the sample did not fully expose all behaviors.…”
Section: Related Workmentioning
confidence: 99%
“…Different antivirus softwares have different ability to identify and definite malicious programs [2]. Therefore, constructing a high-quality test set can adequately evaluate the performance of antivirus software and help improve antivirus strategies [3], [4]. The test sample set should be characterized by appropriate volume, rich variety, and retention of the original dataset density distribution characteristics.…”
Section: Introductionmentioning
confidence: 99%