2019
DOI: 10.1007/978-3-030-22038-9_15
|View full text |Cite
|
Sign up to set email alerts
|

SAFE: Self-Attentive Function Embeddings for Binary Similarity

Abstract: The binary similarity problem consists in determining if two functions are similar by only considering their compiled form. Advanced techniques for binary similarity recently gained momentum as they can be applied in several fields, such as copyright disputes, malware analysis, vulnerability detection, etc., and thus have an immediate practical impact. Current solutions compare functions by first transforming their binary code in multi-dimensional vector representations (embeddings), and then comparing vectors… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
174
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 132 publications
(175 citation statements)
references
References 23 publications
1
174
0
Order By: Relevance
“…We performed a further comparison with the SAFE architecture proposed by us in a previous paper [29]. SAFE does not use the CFG but a self-attentive recurrent neural network that parses all instructions according to their addresses.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…We performed a further comparison with the SAFE architecture proposed by us in a previous paper [29]. SAFE does not use the CFG but a self-attentive recurrent neural network that parses all instructions according to their addresses.…”
Section: Discussionmentioning
confidence: 99%
“…• we discuss our findings in Section VI. We note that despite taking into account the syntactic structure of code using the CFG our techniques underperform or have comparable performances, on both task, when compared with a solution [29] that examine sequentially all the disassembled instructions, without information on the control flow given by the CFG. We discuss our hypothesis on this phenomena, giving a possible explanation on the shortcomings of blindly embedding the CFG.…”
Section: Introductionmentioning
confidence: 92%
“…Also, it does not consider any program-wide CFG structural information during analysis. SAFE [41] leverages a self-attentive neural network to generate function embeddings.…”
Section: A Code Similarity Detectionmentioning
confidence: 99%
“…Besides, manually feature engineering needs a lot of domain knowledge of assembly code, which is not friendly for most researchers. To address the above issues, static word representation based methods are applied to program language processing in recent works [4], [6]- [8]. In these works, tokens in the basic block, like operators (opcodes) and operands, are represented as fixed-dimension vectors.…”
Section: A Basic Block Embeddingmentioning
confidence: 99%