Malware phylogeny generation using permutations of code

Karim, Md. Enamul; Walenstein, Andrew; Lakhotia, Arun; Parida, Laxmi

doi:10.1007/s11416-005-0002-9

Cited by 181 publications

(119 citation statements)

References 12 publications

Supporting

Mentioning

119

Contrasting

Order By: Relevance

“…for authorship attribution, due to two factors. First, there are fewer programs per author (4-7) than in the other data sets (8)(9)(10)(11)(12)(13)(14)(15)(16), making this a fundamentally harder learning problem. More importantly, the programs in this data set do not reflect only the work of individual programmers; students in the course were often provided with substantial amounts of partially implemented skeleton code, and also worked closely with the course professor follow an often rigid specification at the sub-module level.…”

Section: Classificationmentioning

confidence: 99%

“…The instruction-level features we use are similar to those used in malware classification [2,8,9], particularly n-grams; our idiom features differ from features based on instruction sequences through the use of wildcards and the abstraction of low-level details like the opcode and immediate values The instruction summary colors we use in the graphlet features are inspired by a technique to identify polymorphic malware variants [11]. Although some of the binary code representations we use are similar to existing work, our techniques are largely orthogonal: malware classification seeks to extract characteristics specific to a program or a family of programs with related behavior, while our authorship attribution techniques must discover more general properties of author style.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Who Wrote This Code? Identifying the Authors of Program Binaries

Rosenblum

Zhu

Miller

2011

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Program authorship attribution-identifying a programmer based on stylistic characteristics of code-has practical implications for detecting software theft, digital forensics, and malware analysis. Authorship attribution is challenging in these domains where usually only binary code is available; existing source code-based approaches to attribution have left unclear whether and to what extent programmer style survives the compilation process. Casting authorship attribution as a machine learning problem, we present a novel program representation and techniques that automatically detect the stylistic features of binary code. We apply these techniques to two attribution problems: identifying the precise author of a program, and finding stylistic similarities between programs by unknown authors. Our experiments provide strong evidence that programmer style is preserved in program binaries.

show abstract

Section: Classificationmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Who Wrote This Code? Identifying the Authors of Program Binaries

Rosenblum

Zhu

Miller

2011

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…In their proposed work, authors [34] and [35] detected morphed malware variants using a rewriting engine. Syntactic and semantic structure of variants program was analysed.…”

Section: Existing Workmentioning

confidence: 99%

Detecting and Classifying Morphed Malwares: A Survey

Singla¹,

Gandotra²,

Bansal³

et al. 2015

IJCA

View full text Add to dashboard Cite

In this era, most of the antivirus companies are facing immense difficulty in detecting morphed malwares as they conceal themselves from detection. Malwares use various techniques to camouflage themselves so as to increase their lifetime. These obscure methods cannot completely impede analysis, but it prolongs the process of analysis and detection. This paper presents a review on malware detection systems and the progress made in detecting advanced malwares which will serve as a reference to researchers interested in working on advance malware detection systems.

show abstract

“…Previous researches of malware phylogeny inference mainly focused on tree-based model [1]- [6]. Karim et al [1] used the UPGMA algorithm to generate phylogeny trees.…”

Section: Introductionmentioning

confidence: 99%

“…Karim et al [1] used the UPGMA algorithm to generate phylogeny trees. Gupta et al [6] proposed graph pruning techniques to establish phylogeny trees of malcode based on temporal informations.…”

Section: Introductionmentioning

confidence: 99%

Inferring Phylogenetic Network of Malware Families Based on Splits Graph

Liu

Wang

Xie

et al. 2017

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYMalware phylogeny refers to inferring the evolutionary relationships among instances of a family. It plays an important role in malware forensics. Previous works mainly focused on tree-based model. However, trees cannot represent reticulate events, such as inheriting code fragments from different parents, which are common in variants generation. Therefore, phylogenetic networks as a more accurate and general model have been put forward. In this paper, we propose a novel malware phylogenetic network construction method based on splits graph, taking advantage of the one-to-one correspondence between reticulate events and netted components in splits graph. We evaluate our algorithm on three malware families and two benign families whose ground truth are known and compare with competing algorithms. Experiments demonstrate that our method achieves a higher mean accuracy of 64.8%.

show abstract

Malware phylogeny generation using permutations of code

Cited by 181 publications

References 12 publications

Who Wrote This Code? Identifying the Authors of Program Binaries

Who Wrote This Code? Identifying the Authors of Program Binaries

Detecting and Classifying Morphed Malwares: A Survey

Inferring Phylogenetic Network of Malware Families Based on Splits Graph

Contact Info

Product

Resources

About