Unknown Malcode Detection Using OPCODE Representation

Moskovitch, Robert; Feher, Clint; Tzachar, Nir; Berger, Eugene; Gitelman, Marina; Dolev, Shlomi; Elovici, Yuval

doi:10.1007/978-3-540-89900-6_21

Cited by 143 publications

(114 citation statements)

References 12 publications

Supporting

Mentioning

114

Contrasting

Order By: Relevance

“…Known feature sets that have already been used in the past to detect malicious programs: n-grams [4], opcodes [5], Android permissions combined with Control Flow Graphs [6] and several others. Finding the feature set that generalizes the most our observable is the most challenging task.…”

Section: A Feature Extractionmentioning

confidence: 99%

“…Dynamic approaches must take into account the multi-entry points issue due to the component-based paradigm of Android, whereas static approaches must deal with known Figure 1. A method translated into a 3-grams vector obfuscation techniques 5 . In this paper, we propose a static approach combining opcode-sequences and machine learning techniques.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Using opcode-sequences to detect malicious Android applications

Jérôme

Allix

State

et al. 2014

2014 IEEE International Conference on Communications (ICC)

View full text Add to dashboard Cite

Abstract-Recently, the Android platform has seen its number of malicious applications increased sharply. Motivated by the easy application submission process and the number of alternative market places for distributing Android applications, rogue authors are developing constantly new malicious programs. While current anti-virus software mainly relies on signature detection, the issue of alternative malware detection has to be addressed. In this paper, we present a feature based detection mechanism relying on opcode-sequences combined with machine learning techniques. We assess our tool on both a reference dataset known as Genome Project as well as on a wider sample of 40,000 applications retrieved from the Google Play Store.

show abstract

Section: A Feature Extractionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Using opcode-sequences to detect malicious Android applications

Jérôme

Allix

State

et al. 2014

2014 IEEE International Conference on Communications (ICC)

View full text Add to dashboard Cite

show abstract

“…The reason of not extracting further opcode-sequence lengths is that the underlying complexity of the feature selection step and the huge amount of features obtained would render the extraction very slow. Besides, an opcode-sequence length of 2 has proven to be the best configuration in a previous work (Moskovitch et al, 2008a).…”

Section: Empirical Studymentioning

confidence: 91%

“…In a previous work Moskovitch et al (2008a), a larger dataset was employed to validate the model. We did not use a larger training dataset because of technical limitations.…”

Section: Empirical Studymentioning

confidence: 99%

“…Additionally, opcode sequences have recently been introduced as an alternative to byte n-grams (Dolev and Tzachar, 2008;Santos et al, 2010;Moskovitch et al, 2008a). This approach appears to be theoretically better because it relies on source code rather than the bytes of a binary file (Christodorescu, 2007) (for a more detailed review of static features for machine-learning unknown malware detection refer to Shabtai et al (2009)).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Using opcode sequences in single-class learning to detect unknown malware

et al. 2011

View full text Add to dashboard Cite

Malware is any type of malicious code that has the potential to harm a computer or network. The volume of malware is growing at a faster rate every year and poses a serious global security threat. Although signaturebased detection is the most widespread method used in commercial antivirus programs, it consistently fails to detect new malware. Supervised machinelearning models have been used to address this issue. However, the use of supervised learning is limited because it needs a large amount of malicious code and benign software to first be labelled. In this paper, we propose a new method that uses single-class learning to detect unknown malware families. This method is based on examining the frequencies of the appearance of opcode sequences to build a machine-learning classifier using only one set of labelled instances within a specific class of either malware or legitimate software. We performed an empirical study that shows that this method can reduce the effort of labelling software while maintaining high accuracy. * Corresponding author Email addresses: isantos@deusto.es (Igor Santos), felix.brezo@deusto.es (Felix Brezo), borja.sanz@deusto.es (Borja Sanz), claorden@deusto.es (Carlos Laorden), pablo.garcia.bringas@deusto.es (Pablo G. Bringas)

show abstract

A fast malware feature selection approach using a hybrid of multi‐linear and stepwise binary logistic regression

Huda

Abawajy

Abdollahian

et al. 2016

Concurrency and Computation

View full text Add to dashboard Cite

Summary Malware replicates itself and produces offspring with the same characteristics but different signatures by using code obfuscation techniques. Current generation anti‐virus engines employ a signature‐template type detection approach where malware can easily evade existing signatures in the database. This reduces the capability of current anti‐virus engines in detecting malware. In this paper, we propose a stepwise binary logistic regression‐based dimensionality reduction techniques for malware detection using application program interface (API) call statistics. Finding the most significant malware feature using traditional wrapper‐based approaches takes an exponential complexity of the dimension (m) of the dataset with a brute‐force search strategies and order of (m‐1) complexity with a backward elimination filter heuristics. The novelty of the proposed approach is that it finds the worst case computational complexity which is less than order of (m‐1). The proposed approach uses multi‐linear regression and the p‐value of each individual API feature for selection of the most uncorrelated and significant features in order to reduce the dimensionality of the large malware data and to ensure the absence of multi‐collinearity. The stepwise logistic regression approach is then employed to test the significance of the individual malware feature based on their corresponding Wald statistic and to construct the binary decision the model. When the selected most significant APIs are used in a decision rule generation systems, this approach not only reduces the tree size but also improves classification performance. Exhaustive experiments on a large malware data set show that the proposed approach clearly exceeds the existing standard decision rule, support vector machine‐based template approach with complete data and provides a better statistical fitness. Copyright © 2016 John Wiley & Sons, Ltd.

show abstract

Unknown Malcode Detection Using OPCODE Representation

Cited by 143 publications

References 12 publications

Using opcode-sequences to detect malicious Android applications

Using opcode-sequences to detect malicious Android applications

Using opcode sequences in single-class learning to detect unknown malware

A fast malware feature selection approach using a hybrid of multi‐linear and stepwise binary logistic regression

Contact Info

Product

Resources

About