Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation 2016
DOI: 10.1145/2908080.2908126
|View full text |Cite
|
Sign up to set email alerts
|

Statistical similarity of binaries

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
105
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 103 publications
(105 citation statements)
references
References 16 publications
0
105
0
Order By: Relevance
“…Single Platform solutions -Regarding the literature of binary-similarity for a single platform, a family of works is based on matching algorithms for function CFGs. In Bindiff [13] matching among vertices is based on the syntax of code, and it is known to perform poorly across different compiler (see [9]). Pewny et al [24] proposed a solution where each vertex of a CFG is represented with an expression tree; similarity among vertices is computed by using the edit distance between the corresponding expression trees.…”
Section: Work Not Based On Embeddingsmentioning
confidence: 99%
See 1 more Smart Citation
“…Single Platform solutions -Regarding the literature of binary-similarity for a single platform, a family of works is based on matching algorithms for function CFGs. In Bindiff [13] matching among vertices is based on the syntax of code, and it is known to perform poorly across different compiler (see [9]). Pewny et al [24] proposed a solution where each vertex of a CFG is represented with an expression tree; similarity among vertices is computed by using the edit distance between the corresponding expression trees.…”
Section: Work Not Based On Embeddingsmentioning
confidence: 99%
“…David and Yahav [11] proposed to represent a function as several independent execution traces, called tracelets; similar tracelets are then matched by using a custom edit-distance. A related concept is used by David et al in [9] where functions are divided in pieces of independent code, called strands. The matching between functions is based on how many statistically significant strands are similar.…”
Section: Work Not Based On Embeddingsmentioning
confidence: 99%
“…First, static plagiarism detection or clone detection includes string-based [2], [5], [15], AST-based [32], [57], [63], [36], token-based [33], [55], [54], and PDGbased [22], [40], [11], [39]. Source code-based approaches are Recent works have applied traditional approaches to addressing the cross-architecture scenario [53], [19], [8], [20], [13], [14], [12]. Multi-MH and Multi-k-MH [53] are the first two methods for comparing functions of different ISAs.…”
Section: Related Workmentioning
confidence: 99%
“…discovRE [19] boosts CFG-based matching process, but is still expensive. Both Esh [12] and its successor [13] use dataflow slices of basic blocks as the basic comparable unit. Esh uses SMT solver to verify function similarity, which makes it unscalable.…”
Section: Related Workmentioning
confidence: 99%
“…Algorithm 1 presents the pseudo-code of instrumentation. BINMATCH traverses each instruction (I) of F. If I accesses global variables, performs comparison operations, or calls a standard library function, BINMATCH injects code before I Ir ← record_oprd_val (Ir) 8 if I calls a standard library function then 9 Ir ← record_libc_name (Ir) 10 // record runtime information 11 if I reads an argument of the function then 12 Ir ← record_arg_val (Ir) 13 else if I calls a function indirectly then 14 Ir ← record_func_addr (Ir) 15 else if a function returns then 16 Ir ← record_ret_val (Ir) 17 return Ir to capture corresponding features and generate the signature of F (Line 4-9).…”
Section: B Instrumentation and Executionmentioning
confidence: 99%