2022
DOI: 10.1109/tse.2021.3056139
|View full text |Cite
|
Sign up to set email alerts
|

Codee: A Tensor Embedding Scheme for Binary Code Search

Abstract: Given a target binary function, the binary code search retrieves top-K similar functions in the repository, and similar functions represent that they are compiled from the same source codes. Searching binary code is particularly challenging due to large variations of compiler tool-chains and options and CPU architectures, as well as thousands of binary codes. Furthermore, there are some pivotal issues in current binary code search schemes, including inaccurate text-based or token-based analysis, slow graph mat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
29
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 33 publications
(29 citation statements)
references
References 47 publications
0
29
0
Order By: Relevance
“…Nevertheless, it does not use the normalization method for assembly instruction. DEEPBINDIFF [22] and Codee [45] also consider the division of instructions into opcodes and operands, and they adopt a normalization strategy for dealing with operands. BinDeep [46] applies a more fine-grained normalization strategy.…”
Section: Learning-based Approachesmentioning
confidence: 99%
“…Nevertheless, it does not use the normalization method for assembly instruction. DEEPBINDIFF [22] and Codee [45] also consider the division of instructions into opcodes and operands, and they adopt a normalization strategy for dealing with operands. BinDeep [46] applies a more fine-grained normalization strategy.…”
Section: Learning-based Approachesmentioning
confidence: 99%
“…The study of learning-based BCSD has been inspired by recent development in natural language processing (NLP) [39,45,56], which uses real-valued vectors called embeddings to encode semantic information of words and sentences. Building upon these techniques, previous studies [14,15,24,40,43,44,51,59,[61][62][63] applied deep learning methods to binary similarity detection. Shared by many of these studies is the idea of embedding binary functions into numerical vectors, and then using vector distance to approximate the similarity between different binary functions.…”
Section: Learning-based Bcsd Approachesmentioning
confidence: 99%
“…Gemini and VulSeeker encode basic blocks with manually-selected features, while GraphEmb and OrderMatters use neural networks to learn the embeddings of basic blocks. Another approach is propsoed by DEEPBINDIFF [15] and Codee [61], which uses neural networks to learn the embeddings of generated instruction sequences instead of embedding the ACFG of binary functions.…”
Section: Learning-based Bcsd Approachesmentioning
confidence: 99%
See 1 more Smart Citation
“…Therefore, we can leverage semantic features to improve change prediction models. In recent years, researchers have utilized deep learning technology to extract effective features from code to perform defect prediction, [12][13][14] code clone detection, 15,16 code search, 17,18 code summary, 19,20 and other software engineering tasks. Their common practice is to employ an abstract syntax tree (AST) to model the semantic information of the code.…”
mentioning
confidence: 99%