Accurate Adware Detection Using Opcode Sequence Extraction

Shahzad, Raja Khurram; Lavesson, Niklas; Johnson, Henric

doi:10.1109/ares.2011.35

Cited by 23 publications

(8 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In Table 1, we show statistics on the number of unique n-gram opcodes obtained from our dataset, which has a total of 2000 benign and malware samples. Our findings attest to other research [7,8,25] that the number of unique n-grams increases proportionally to the size of n. Since machine learning classifiers only understand features in numerical representations, we vectorize each sample's n-gram opcode sequences using the term frequency-inverse document frequency (TF-IDF) [26,27]. TF-IDF works by creating a dictionary of unique n-gram opcode sequences and then measures the frequency of occurrence of each unique n-gram opcode within a given sample using the term frequency (TF) and with inverse document frequency (IDF), measures the importance of the unique n-gram opcode on the basis of frequency of occurrence across the entire corpus.…”

Section: Fig 2 Example Of N-gram Opcode Sequences Generationsupporting

confidence: 92%

Malware Detection Using Ensemble N-gram Opcode Sequences

Yeboah¹,

Amuquandoh

Musah³

2021

Int. J. Interact. Mob. Technol.

View full text Add to dashboard Cite

Conventional approaches to tackling malware attacks have proven to be futile at detecting never-before-seen (zero-day) malware. Research however has shown that zero-day malicious files are mostly semantic-preserving variants of already existing malware, which are generated via obfuscation methods. In this paper we propose and evaluate a machine learning based malware detection model using ensemble approach. We employ a strategy of ensemble where multiple feature sets generated from different n-gram sizes of opcode sequences are trained using a single classifier. Model predictions on the trained multi feature sets are weighted and combined on average to make a final verdict on whether a binary file is malicious or benign. To obtain optimal weight combination for the ensemble feature sets, we applied a grid search on a set of pre-defined weights in the range 0 to 1. With a balanced dataset of 2000 samples, an ensemble of n-gram opcode sequences of n sizes 1 and 2 with respective weight pair 0.3 and 0.7 yielded the best detection accuracy of 98.1% using random forest (RF) classifier. Ensemble n-gram sizes 2 and 3 obtained 99.7% as best precision using weight 0.5 for both models.

show abstract

Section: Fig 2 Example Of N-gram Opcode Sequences Generationsupporting

confidence: 92%

Malware Detection Using Ensemble N-gram Opcode Sequences

Yeboah¹,

Amuquandoh

Musah³

2021

Int. J. Interact. Mob. Technol.

View full text Add to dashboard Cite

show abstract

“…Besides CNN, RNN has also been used for malware analysis. [28] and [29] proposed techniques with LSTM using opcode sequences of malware. Santos et al [30] proposed a hybrid technique by integrating both static and dynamic analysis.…”

Section: Related Workmentioning

confidence: 99%

MALIGN: Adversarially Robust Malware Family Detection using Sequence Alignment

Saha¹,

Afroz²,

Rahman³

2021

Preprint

View full text Add to dashboard Cite

We propose MALIGN, a novel malware family detection approach inspired by genome sequence alignment. MALIGN encodes malware using four nucleotides and then uses genome sequence alignment approaches to create a signature of a malware family based on the code fragments conserved in the family making it robust to evasion by modification and addition of content. Moreover, unlike previous approaches based on sequence alignment, our method uses a multiple wholegenome alignment tool that protects against adversarial attacks such as code insertion, deletion or modification. Our approach outperforms state-of-the-art machine learning based malware detectors and demonstrates robustness against trivial adversarial attacks. MALIGN also helps identify the techniques malware authors use to evade detection.

show abstract

“…As the model is trained on disassembled virus executables, the quality of the disassembler may affects the results [22]. Other than that, since it just retrieves part of the program, it may miss some important information of malicious code [23]. But the computation time is faster as the data size is smaller.…”

Section: Related Workmentioning

confidence: 99%

Obfuscated computer virus detection using machine learning algorithm

Xin¹,

Ismail²,

Khammas³

2019

Bulletin EEI

View full text Add to dashboard Cite

Nowadays, computer virus attacks are getting very advanced. New obfuscated computer virus created by computer virus writers will generate a new shape of computer virus automatically for every single iteration and download. This constantly evolving computer virus has caused significant threat to information security of computer users, organizations and even government. However, signature based detection technique which is used by the conventional anti-computer virus software in the market fails to identify it as signatures are unavailable. This research proposed an alternative approach to the traditional signature based detection method and investigated the use of machine learning technique for obfuscated computer virus detection. In this work, text strings are used and have been extracted from virus program codes as the features to generate a suitable classifier model that can correctly classify obfuscated virus files. Text string feature is used as it is informative and potentially only use small amount of memory space. Results show that unknown files can be correctly classified with 99.5% accuracy using SMO classifier model. Thus, it is believed that current computer virus defense can be strengthening through machine learning approach.

show abstract

Accurate Adware Detection Using Opcode Sequence Extraction

Cited by 23 publications

References 13 publications

Malware Detection Using Ensemble N-gram Opcode Sequences

Malware Detection Using Ensemble N-gram Opcode Sequences

MALIGN: Adversarially Robust Malware Family Detection using Sequence Alignment

Obfuscated computer virus detection using machine learning algorithm

Contact Info

Product

Resources

About