When Coding Style Survives Compilation: De-anonymizing Programmers from Executable Binaries

Çalışkan, Aylin; Yamaguchi, Fabian; Dauber, Edwin; Harang, Richard; Rieck, Konrad; Greenstadt, Rachel; Narayanan, Arvind

doi:10.14722/ndss.2018.23304

Cited by 69 publications

(122 citation statements)

References 29 publications

Supporting

Mentioning

121

Contrasting

Order By: Relevance

“…To address the challenges of achieving formal utility-loss guarantees, e.g., 0 label loss and bounded confidence score distortion, we design new methods to find adversarial examples. Other than membership inference attacks, many other attacks rely on machine learning classifiers, e.g., attribute inference attacks [11,17,28], website fingerprinting attacks [7,22,29,46,67], side-channel attacks [73], location attacks [5,45,52,72], and author identification attacks [8,41]. For instance, online social network users are vulnerable to attribute inference attacks, in which an attacker leverages a machine learning classifier to infer users' private attributes (e.g., gender, political view, and sexual orientation) using their public data (e.g., page likes) on social networks.…”

Section: Discussion and Limitationsmentioning

confidence: 99%

MemGuard

Jia

Salem

Backes

et al. 2019

Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security

241

View full text Add to dashboard Cite

In a membership inference attack, an attacker aims to infer whether a data sample is in a target classifier's training dataset or not. Specifically, given a black-box access to the target classifier, the attacker trains a binary classifier, which takes a data sample's confidence score vector predicted by the target classifier as an input and predicts the data sample to be a member or non-member of the target classifier's training dataset. Membership inference attacks pose severe privacy and security threats to the training dataset. Most existing defenses leverage differential privacy when training the target classifier or regularize the training process of the target classifier. These defenses suffer from two key limitations: 1) they do not have formal utility-loss guarantees of the confidence score vectors, and 2) they achieve suboptimal privacy-utility tradeoffs.In this work, we propose MemGuard, the first defense with formal utility-loss guarantees against black-box membership inference attacks. Instead of tampering the training process of the target classifier, MemGuard adds noise to each confidence score vector predicted by the target classifier. Our key observation is that attacker uses a classifier to predict member or non-member and classifier is vulnerable to adversarial examples. Based on the observation, we propose to add a carefully crafted noise vector to a confidence score vector to turn it into an adversarial example that misleads the attacker's classifier. Specifically, MemGuard works in two phases. In Phase I, MemGuard finds a carefully crafted noise vector that can turn a confidence score vector into an adversarial example, which is likely to mislead the attacker's classifier to make a random guessing at member or non-member. We find such carefully crafted noise vector via a new method that we design to incorporate the unique utility-loss constraints on the noise vector. In Phase II, Mem-Guard adds the noise vector to the confidence score vector with a certain probability, which is selected to satisfy a given utility-loss budget on the confidence score vector. Our experimental results on Permission to make digital three datasets show that MemGuard can effectively defend against membership inference attacks and achieve better privacy-utility tradeoffs than existing defenses. Our work is the first one to show that adversarial examples can be used as defensive mechanisms to defend against membership inference attacks.

show abstract

Section: Discussion and Limitationsmentioning

confidence: 99%

MemGuard

Jia

Salem

Backes

et al. 2019

Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security

241

View full text Add to dashboard Cite

show abstract

“…When the unpacking routine has finished its run, the execution pointer jumps to the first instruction of the original program. For example, UPX (Ultimate Packer for Executables) 3 is a free and open-source executable packer which mainly compresses the executable rather than obfuscating them. Authors may use this technique for faster loading their program into memory due to low file size.…”

Section: Other Methods Of Protecting Binariesmentioning

confidence: 99%

“…In some prominent studies such as [14], [2], authors have utilized machine learning techniques to correlate syntax-based features with authorship to identify the author of program binaries. In [3], authors have analyzed the effects of compiler optimization (in three levels), removing symbol information and applying basic binary obfuscation methods (such as instruction replacement and control flow graph obfuscation) on several features mainly obtained from disassembling and decompiling the executable binaries (e.g. token n-grams and features driven from the Abstract Syntax Tree).…”

Section: Other Methods Of Protecting Binariesmentioning

confidence: 99%

An attempt toward Authorship Analysis of Obfuscated .NET Binaries

Morovati¹

2017

IJCSDF

View full text Add to dashboard Cite

This research is an attempt toward facilitating the authorship attribution of an unknown .NET executable by identifying obfuscation resistant features of .NET binaries. The primary goal of this study is to examine the effectiveness of obfuscation techniques for hiding the author's programming style. In this research, I have tested features such as op-code frequencies, op-code n-grams, API function calls and some features obtained from program Control Flow Graph.

show abstract

“…Rosenblum et al [21] apply machine learning to style features extracted from binaries; Caliskan-Islam et al [6] build on this work. Muir and Wikström [17] find that changing compiler settings and linking statically can be used to decrease attribution accuracy and obfuscate authorship on binary attribution classifiers -the only other work, to our knowledge, focused on authorship obfuscation for programs.…”

Section: Classifying Binariesmentioning

confidence: 99%

Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution

Simko

Zettlemoyer

Kohno

2018

Proceedings on Privacy Enhancing Technologies

View full text Add to dashboard Cite

Source code attribution classifiers have recently become powerful. We consider the possibility that an adversary could craft code with the intention of causing a misclassification, i.e., creating a forgery of another author's programming style in order to hide the forger's own identity or blame the other author. We find that it is possible for a non-expert adversary to defeat such a system. In order to inform the design of adversarially resistant source code attribution classifiers, we conduct two studies with C/C++ programmers to explore the potential tactics and capabilities both of such adversaries and, conversely, of human analysts doing source code authorship attribution. Through the quantitative and qualitative analysis of these studies, we (1) evaluate a state-of-the-art machine classifier against forgeries, (2) evaluate programmers as human analysts/forgery detectors, and (3) compile a set of modifications made to create forgeries. Based on our analyses, we then suggest features that future source code attribution systems might incorporate in order to be adversarially resistant.

show abstract

When Coding Style Survives Compilation: De-anonymizing Programmers from Executable Binaries

Cited by 69 publications

References 29 publications

MemGuard

MemGuard

An attempt toward Authorship Analysis of Obfuscated .NET Binaries

Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution

Contact Info

Product

Resources

About