Marc Szafraniec scite author profile

Marc Szafraniec

5Publications

19Citation Statements Received

149Citation Statements Given

How they've been cited

How they cite others

112

147

Affiliations

Publications

Order By: Most citations

Continuous Surface Embeddings

Neverova¹,

Novotný²,

Khalidov³

et al. 2020

Preprint

View full text Add to dashboard Cite

In this work, we focus on the task of learning and representing dense correspondences in deformable object categories. While this problem has been considered before, solutions so far have been rather ad-hoc for specific object types (i.e., humans), often with significant manual work involved. However, scaling the geometry understanding to all objects in nature requires more automated approaches that can also express correspondences between related, but geometrically different objects. To this end, we propose a new, learnable image-based representation of dense correspondences. Our model predicts, for each pixel in a 2D image, an embedding vector of the corresponding vertex in the object mesh, therefore establishing dense correspondences between image pixels and 3D object geometry. We demonstrate that the proposed approach performs on par or better than the state-ofthe-art methods for dense pose estimation for humans, while being conceptually simpler. We also collect a new in-the-wild dataset of dense correspondences for animal classes and demonstrate that our framework scales naturally to the new deformable object categories.34th Conference on Neural Information Processing Systems (NeurIPS 2020),

show abstract

DOBF: A Deobfuscation Pre-Training Objective for Programming Languages

Rozière¹,

Lachaux²,

Szafraniec³

et al. 2021

Preprint

View full text Add to dashboard Cite

Recent advances in self-supervised learning have dramatically improved the state of the art on a wide variety of tasks. However, research in language model pre-training has mostly focused on natural languages, and it is unclear whether models like BERT and its variants provide the best pretraining when applied to other modalities, such as source code. In this paper, we introduce a new pre-training objective, DOBF, that leverages the structural aspect of programming languages and pre-trains a model to recover the original version of obfuscated source code. We show that models pre-trained with DOBF significantly outperform existing approaches on multiple downstream tasks, providing relative improvements of up to 13% in unsupervised code translation, and 24% in natural language code search. Incidentally, we found that our pre-trained model is able to deobfuscate fully obfuscated source files, and to suggest descriptive variable names.

show abstract

Code Translation with Compiler Representations

Szafraniec¹,

Rozière²,

Leather³

et al. 2022

Preprint

View full text Add to dashboard Cite

In this paper, we leverage low-level compiler intermediate representations (IR) to improve code translation. Traditional transpilers rely on syntactic information and handcrafted rules, which limits their applicability and produces unnaturallooking code. Applying neural machine translation (NMT) approaches to code has successfully broadened the set of programs on which one can get a naturallooking translation. However, they treat the code as sequences of text tokens, and still do not differentiate well enough between similar pieces of code which have different semantics in different languages. The consequence is low quality translation, reducing the practicality of NMT, and stressing the need for an approach significantly increasing its accuracy. Here we propose to augment code translation with IRs, specifically LLVM IR, with results on the C++, Java, Rust, and Go languages. Our method improves upon the state of the art for unsupervised code translation, increasing the number of correct translations by 11% on average, and up to 79% for the Java → Rust pair. We extend previous test sets for code translation, by adding hundreds of Go and Rust functions. Additionally, we train models with high performance on the problem of IR decompilation, generating programming source code from IR, and study using IRs as intermediary pivot for translation. * Equal contribution 2 https://en.wikipedia.org/wiki/Source-to-source_compiler Preprint. Under review.

show abstract

Putting Self-Supervised Token Embedding on the Tables

Szafraniec¹,

Marti

Donnat³

2017

View full text Add to dashboard Cite

Information distribution by electronic messages is a privileged means of transmission for many businesses and individuals, often under the form of plain-text tables. As their number grows, it becomes necessary to use an algorithm to extract text and numbers instead of a human. Usual methods are focused on regular expressions or on a strict structure in the data, but are not efficient when we have many variations, fuzzy structure or implicit labels. In this paper we introduce SC2T, a totally self-supervised model for constructing vector representations of tokens in semi-structured messages by using characters and context levels that address these issues. It can then be used for an unsupervised labeling of tokens, or be the basis for a semi-supervised information extraction system.

show abstract

Putting Self-Supervised Token Embedding on the Tables

Szafraniec¹,

Marti²,

Donnat³

2017

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Marc Szafraniec

Continuous Surface Embeddings

DOBF: A Deobfuscation Pre-Training Objective for Programming Languages

Code Translation with Compiler Representations

Putting Self-Supervised Token Embedding on the Tables

Putting Self-Supervised Token Embedding on the Tables

Contact Info

Product

Resources

About