“…In this way, their similarity can be calculated using their corresponding embeddings. Typical code representation learning allows only one single code format of the matched objects, i.e., either source-tosource [16,37,38,49,61] or binary-to-binary [28,35,39,58,67] code matching. However, for binary source code matching, C/C++ language features (e.g., function inlining [23]) and compiler optimization (e.g., code motion [30]) can lead to substantial differences between binary code and source code, and such disparity can be rather challenging when designing BinaryAI.…”