Jeremy Lacomis scite author profile

The decompiler is one of the most common tools for examining binaries without corresponding source code. It transforms binaries into high-level code, reversing the compilation process. Decompilers can reconstruct much of the information that is lost during the compilation process (e.g., structure and type information). Unfortunately, they do not reconstruct semantically meaningful variable names, which are known to increase code understandability. We propose the Decompiled Identifier Renaming Engine (DIRE), a novel probabilistic technique for variable name recovery that uses both lexical and structural information recovered by the decompiler. We also present a technique for generating corpora suitable for training and evaluating models of decompiled code renaming, which we use to create a corpus of 164,632 unique x86-64 binaries generated from C projects mined from GITHUB. 1 Our results show that on this corpus DIRE can predict variable names identical to the names in the original source code up to 74.3% of the time.

show abstract

Automatically Exploring Tradeoffs Between Software Output Fidelity and Energy Costs

Dorn

Lacomis

Weimer

et al. 2019

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

Meaningful variable names for decompiled code

Jaffe

Lacomis

Schwartz

et al. 2018

View full text Add to dashboard Cite

VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning

Chen¹,

Lacomis²,

Schwartz³

et al. 2021

Preprint

View full text Add to dashboard Cite

Variable names are critical for conveying intended program behavior. Machine learning-based program analysis methods use variable name representations for a wide range of tasks, such as suggesting new variable names and bug detection. Ideally, such methods could capture semantic relationships between names beyond syntactic similarity, e.g., the fact that the names average and mean are similar. Unfortunately, previous work has found that even the best of previous representation approaches primarily capture "relatedness" (whether two variables are linked at all), rather than "similarity" (whether they actually have the same meaning).We propose VarCLR, a new approach for learning semantic representations of variable names that effectively captures variable similarity in this stricter sense. We observe that this problem is an excellent fit for contrastive learning, which aims to minimize the distance between explicitly similar inputs, while maximizing the distance between dissimilar inputs. This requires labeled training data, and thus we construct a novel, weakly-supervised variable renaming dataset mined from GitHub edits. We show that VarCLR enables the effective application of sophisticated, general-purpose language models like BERT, to variable name representation and thus also to related downstream tasks like variable name similarity search or spelling correction. VarCLR produces models that significantly outperform the state-of-the-art on IdBench, an existing benchmark that explicitly captures variable similarity (as distinct from relatedness). Finally, we contribute a release of all data, code, and pre-trained models, aiming to provide a drop-in replacement for variable representations used in either existing or future program analyses that rely on variable names. INTRODUCTIONVariable names convey key information about code structure and developer intention. They are thus central for code comprehension, readability, and maintainability [7, 46].

show abstract

VarCLR

Chen

Lacomis

Schwartz

et al. 2022

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jeremy Lacomis

DIRE: A Neural Approach to Decompiled Identifier Naming

Automatically Exploring Tradeoffs Between Software Output Fidelity and Energy Costs

Meaningful variable names for decompiled code

VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning

VarCLR

Contact Info

Product

Resources

About