2022
DOI: 10.48550/arxiv.2205.12713
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

jTrans: Jump-Aware Transformer for Binary Code Similarity

Abstract: Binary code similarity detection (BCSD) has important applications in various fields such as vulnerabilities detection, software component analysis, and reverse engineering. Recent studies have shown that deep neural networks (DNNs) can comprehend instructions or control-flow graphs (CFG) of binary code and support BCSD. In this study, we propose a novel Transformer-based approach, namely jTrans, to learn representations of binary code. It is the first solution that embeds control flow information of binary co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
18
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(18 citation statements)
references
References 41 publications
0
18
0
Order By: Relevance
“…We use IDA Pro binary, a disassembler tool [31] to dump the binary code of the functions for BinXray, which is the same as the BinXray used in their original experiments. To the best of our knowledge, no official open source implementation of Asm2Vec is available, so we utilized an unofficial implementation 5 followed [34] with default parameter settings. The experimental setup consists of an Intel CPU operating at 2.90GHz and running on Linux.…”
Section: Experimental Settingmentioning
confidence: 99%
“…We use IDA Pro binary, a disassembler tool [31] to dump the binary code of the functions for BinXray, which is the same as the BinXray used in their original experiments. To the best of our knowledge, no official open source implementation of Asm2Vec is available, so we utilized an unofficial implementation 5 followed [34] with default parameter settings. The experimental setup consists of an Intel CPU operating at 2.90GHz and running on Linux.…”
Section: Experimental Settingmentioning
confidence: 99%
“…Moreover, traditional graph-based methods such as graph isomorphism matching are excessively time-consuming for analyzing large-scale firmware, leading to relatively lower accuracy and scalability. In recent years, researchers have increasingly adopted learning-based approaches to tackle BCSD tasks, and the current state-of-the-art BCSD approaches [11,13,14,34,35] are predominantly based on machine learning (ML) techniques. These approaches typically involve the disassembly of binary code into either assembly language or intermediate representation (IR).…”
Section: Introductionmentioning
confidence: 99%
“…As a consequence, considerable differences arise at the lexical and syntactic levels, resulting in severe OOV challenges. For OOV words, existing approaches [11,34,35,37] use normalization based on predefined rules to deal with string literals, immediate numbers, and address offsets. However, when these special words are replaced with dummy tokens, the essential semantics may be compromised.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations