Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis 2022
DOI: 10.1145/3533767.3534367
|View full text |Cite
|
Sign up to set email alerts
|

jTrans: jump-aware transformer for binary code similarity detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 68 publications
(22 citation statements)
references
References 39 publications
0
22
0
Order By: Relevance
“…Many existing binary-to-binary SCA techniques [32,56,70] integrate advanced embedding-based approaches to detect code similarity between binaries and further identify the reused libraries based on the SCA database. Specifically, they leverage deep neural network models to embed binary functions into the representation of vectors and perform binary code clone detection by measuring the similarity between function embeddings [11,40,58,68]. Apart from basic syntactic features, these techniques typically capture semantic features such as the control flow graph (CFG) for each binary function to strengthen their accuracy of code clone detection and the downstream SCA task.…”
Section: Background and Motivation 21 Software Composition Analysismentioning
confidence: 99%
See 2 more Smart Citations
“…Many existing binary-to-binary SCA techniques [32,56,70] integrate advanced embedding-based approaches to detect code similarity between binaries and further identify the reused libraries based on the SCA database. Specifically, they leverage deep neural network models to embed binary functions into the representation of vectors and perform binary code clone detection by measuring the similarity between function embeddings [11,40,58,68]. Apart from basic syntactic features, these techniques typically capture semantic features such as the control flow graph (CFG) for each binary function to strengthen their accuracy of code clone detection and the downstream SCA task.…”
Section: Background and Motivation 21 Software Composition Analysismentioning
confidence: 99%
“…In this way, their similarity can be calculated using their corresponding embeddings. Typical code representation learning allows only one single code format of the matched objects, i.e., either source-tosource [16,37,38,49,61] or binary-to-binary [28,35,39,58,67] code matching. However, for binary source code matching, C/C++ language features (e.g., function inlining [23]) and compiler optimization (e.g., code motion [30]) can lead to substantial differences between binary code and source code, and such disparity can be rather challenging when designing BinaryAI.…”
Section: Embedding-based Function Retrievalmentioning
confidence: 99%
See 1 more Smart Citation
“…CVSkSA first prunes the set of functions to be tested using the KNN model and then optimizes them in the function pre-filtering phase using the SVM model to improve the firmware vulnerability. Wang et al [8] implemented a binary similarity detection tool called jTtans by embedding control information of binary code into a Transformer [9]model. The literature [10]uses a neural network translation model to learn the relationship between two architectures and maps the semantic information of the basic blocks of binary functions to a fixed dimensional vector, which in turn measures the similarity by the distance between the.…”
Section: Related Workmentioning
confidence: 99%
“…Then, it uses embedding technology to transform the graph into vector representations and uses these vectors to train a classifier to detect vulnerabilities. Transformer models, like JTrans [35], have also been utilized in this area. JTrans incorporates control flow information into the Transformer model for binary code similarity detection.…”
Section: Related Workmentioning
confidence: 99%