Timofey Bryksin scite author profile

One recent, significant advance in modeling source code for machine learning algorithms has been the introduction of path-based representation-an approach consisting in representing a snippet of code as a collection of paths from its syntax tree. Such representation efficiently captures the structure of code, which, in turn, carries its semantics and other information. Building the path-based representation involves parsing the code and extracting the paths from its syntax tree; these steps build up to a substantial technical job. With no common reusable toolkit existing for this task, the burden of mining diverts the focus of researchers from the essential work and hinders newcomers in the field of machine learning on code. In this paper, we present PathMiner-an open-source library for mining path-based representations of code. Path-Miner is fast, flexible, well-tested, and easily extensible to support input code in any common programming language.

show abstract

Authorship attribution of source code: a language-agnostic approach and applicability in software engineering

Bogomolov¹,

Kovalenko²,

Rebryk

et al. 2021

View full text Add to dashboard Cite

Authorship attribution (i.e., determining who is the author of a piece of source code) is an established research topic. State-of-theart results for the authorship attribution problem look promising for the software engineering field, where they could be applied to detect plagiarized code and prevent legal issues. With this article, we first introduce a new language-agnostic approach to authorship attribution of source code. Then, we discuss limitations of existing synthetic datasets for authorship attribution, and propose a data collection approach that delivers datasets that better reflect aspects important for potential practical use in software engineering. Finally, we demonstrate that high accuracy of authorship attribution models on existing datasets drastically drops when they are evaluated on more realistic data. We outline next steps for the design and evaluation of authorship attribution models that could bring the research efforts closer to practical use for software engineering. CCS CONCEPTS• Software and its engineering → Software maintenance tools; Software verification and validation; • Security and privacy → Malware and its mitigation.

show abstract

One thousand and one stories: a large-scale survey of software refactoring

Golubev¹,

Kurbatova²,

AlOmar

et al. 2021

View full text Add to dashboard Cite

Hyperstyle

Birillo¹,

Vlasov

Burylov³

et al. 2022

View full text Add to dashboard Cite

In software engineering, it is not enough to simply write code that only works as intended, even if it is free from vulnerabilities and bugs. Every programming language has a style guide and a set of best practices defined by its community, which help practitioners to build solutions that have a clear structure and therefore are easy to read and maintain. To introduce assessment of code quality into the educational process, we developed a tool called Hyperstyle. To make it reflect the needs of the programming community and at the same time be easily extendable, we built it upon several existing professional linters and code checkers. Hyperstyle supports four programming languages (Python, Java, Kotlin, and Javascript) and can be used as a standalone tool or integrated into a MOOC platform. We have integrated the tool into two educational platforms, Stepik and JetBrains Academy, and it has been used to process about one million submissions every week since May 2021. CCS CONCEPTS• Social and professional topics → Student assessment; • Applied computing → Education; • Human-centered computing → Interactive systems and tools.

show abstract

Recommendation of Move Method Refactoring Using Path-Based Representation of Code

Kurbatova

Veselov²,

Golubev

et al. 2020

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Timofey Bryksin

PathMiner: A Library for Mining of Path-Based Representations of Code

Authorship attribution of source code: a language-agnostic approach and applicability in software engineering

One thousand and one stories: a large-scale survey of software refactoring

Hyperstyle

Recommendation of Move Method Refactoring Using Path-Based Representation of Code

Contact Info

Product

Resources

About