Yurii Rebryk scite author profile

Yurii Rebryk

3Publications

22Citation Statements Received

203Citation Statements Given

How they've been cited

How they cite others

203

Affiliations

National Research University Higher School of Economics

Publications

Order By: Most citations

Authorship attribution of source code: a language-agnostic approach and applicability in software engineering

Bogomolov¹,

Kovalenko²,

Rebryk

et al. 2021

View full text Add to dashboard Cite

Authorship attribution (i.e., determining who is the author of a piece of source code) is an established research topic. State-of-theart results for the authorship attribution problem look promising for the software engineering field, where they could be applied to detect plagiarized code and prevent legal issues. With this article, we first introduce a new language-agnostic approach to authorship attribution of source code. Then, we discuss limitations of existing synthetic datasets for authorship attribution, and propose a data collection approach that delivers datasets that better reflect aspects important for potential practical use in software engineering. Finally, we demonstrate that high accuracy of authorship attribution models on existing datasets drastically drops when they are evaluated on more realistic data. We outline next steps for the design and evaluation of authorship attribution models that could bring the research efforts closer to practical use for software engineering. CCS CONCEPTS• Software and its engineering → Software maintenance tools; Software verification and validation; • Security and privacy → Malware and its mitigation.

show abstract

ConVoice: Real-Time Zero-Shot Voice Style Transfer with Convolutional Network

Rebryk¹,

Beliaev²

2020

Preprint

View full text Add to dashboard Cite

Authorship Attribution of Source Code: A Language-Agnostic Approach and Applicability in Software Engineering

Bogomolov¹,

Коваленко²,

Rebryk³

et al. 2020

Preprint

View full text Add to dashboard Cite

Authorship attribution of source code has been an established research topic for several decades. State-of-the-art results for the authorship attribution problem look promising for the software engineering field, where they could be applied to detect plagiarized code and prevent legal issues. With this study, we first introduce a language-agnostic approach to authorship attribution of source code. Two machine learning models based on our approach match or improve over state-of-the-art results, originally achieved by language-specific approaches, on existing datasets for code in C++, Python, and Java. After that, we discuss limitations of existing synthetic datasets for authorship attribution, and propose a data collection approach that delivers datasets that better reflect aspects important for potential practical use in software engineering. In particular, we discuss the concept of work context and its importance for authorship attribution. Finally, we demonstrate that high accuracy of authorship attribution models on existing datasets drastically drops when they are evaluated on more realistic data. We conclude the paper by outlining next steps in design and evaluation of authorship attribution models that could bring the research efforts closer to practical use.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yurii Rebryk

Authorship attribution of source code: a language-agnostic approach and applicability in software engineering

ConVoice: Real-Time Zero-Shot Voice Style Transfer with Convolutional Network

Authorship Attribution of Source Code: A Language-Agnostic Approach and Applicability in Software Engineering

Contact Info

Product

Resources

About