2020
DOI: 10.1609/aaai.v34i01.5466
|View full text |Cite
|
Sign up to set email alerts
|

Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection

Abstract: Binary code similarity detection, whose goal is to detect similar binary functions without having access to the source code, is an essential task in computer security. Traditional methods usually use graph matching algorithms, which are slow and inaccurate. Recently, neural network-based approaches have made great achievements. A binary function is first represented as an control-flow graph (CFG) with manually selected block features, and then graph neural network (GNN) is adopted to compute the graph embeddin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
110
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 161 publications
(110 citation statements)
references
References 13 publications
0
110
0
Order By: Relevance
“…Roberto et al [26] proposed various feature embedding generation schemes of the basic block based on the similar Siamese network to calculate similarity. Yu et al [44] propose combining semantic, structural, and order features of functions to generate the embedding of functions for similarity comparison.…”
Section: Learning-based Approachesmentioning
confidence: 99%
“…Roberto et al [26] proposed various feature embedding generation schemes of the basic block based on the similar Siamese network to calculate similarity. Yu et al [44] propose combining semantic, structural, and order features of functions to generate the embedding of functions for similarity comparison.…”
Section: Learning-based Approachesmentioning
confidence: 99%
“…Massarelli [16] shows a similar method as Gemini [], but replace manual feature extraction with unsupervised-based neural network feature extraction. Yu [19] uses BERT to pretrain the binary code on one token-level task, one block-level task, and two graph-level tasks. Moreover, they adopt convolution neural network (CNN) on adjacency matrices to extract the order information of CFG nodes.…”
Section: Related Workmentioning
confidence: 99%
“…How to deal with the OOV problem is a challenge. Order Matters [19] used BERT to pre-train the binary code on one token-level task, one block-level task, and two graphlevel tasks. They also adopt a convolution neural network (CNN) on adjacency matrices to extract the order information.…”
Section: Introductionmentioning
confidence: 99%
“…Some other features are also taken into consideration, such as numeric features in instruction (Eschweiler et al 2016), callgrah (Gao et al 2008;Liu et al 2018;Wang et al 2009) and control flow graph (Eschweiler et al 2016;Xu et al 2017;Feng et al 2016), and semantic related information including IO behavior (Pewny et al 2015) and execution traces (Chandramohan et al 2016). Besides manually selected features, some work automatically extracts semantic information with the help of NLP techniques (Ding et al 2019;Duan et al 2020;Yu et al 2020).…”
Section: Binary-to-binarymentioning
confidence: 99%