Syntax and Domain Aware Model for Unsupervised Program Translation

Liu, Fang; Li, Jia; Zhang, Li

doi:10.48550/arxiv.2302.03908

Cited by 3 publications

(3 citation statements)

References 32 publications

(85 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the neural code translation task and the neural code repair task, following previous work [32] [6], we use Bilingual Evaluation Understudy (BLEU) score, a common evaluation metric in neural code translation studies [3][3]. BLEU score evaluates the quality of generated functions by measuring the N -gram overlapping between the reference sentence ŷ and the translation y. BLEU-4 is introduced to measure the quality of the translation [2]. The formula is shown in EQ 4.…”

Section: Evaluation Metricsmentioning

confidence: 99%

A Stealthy Backdoor Attack for Code Models

qu,

Huang,

Chen

et al. 2024

Preprint

View full text Add to dashboard Cite

Recent studies have shown that code models are susceptible to backdoor attacks. When injected with a backdoor, the victim code model can function normally on benign samples but may produce predetermined malicious outputs when triggers are activated. However, previous backdoor attacks on code models have used explicit triggers, and we aim to investigate the vulnerability of code models to stealthy backdoor attacks in this study. To this end, we propose a backdoor attack approach using Abstract Syntax Tree-based Triggers (ASTT) to obtain stealthiness. We evaluate ASTT on deep learning-based code models and three downstream tasks (i.e., code translation, code repair, and defect detection). With the clustering algorithm, we generated triggers based on abstract syntax trees. We find that the average attack success rate of our ASTT can reach 92.71%. Moreover, our ASTT is stealthy and can effectively bypass state-of-the-art defense approaches. Finally, we verify that the time overhead of our proposed ASTT is small and can meet the needs in real scenarios. Our finding demonstrates security weaknesses in code models under stealthy backdoor attacks.

show abstract

Section: Evaluation Metricsmentioning

confidence: 99%

A Stealthy Backdoor Attack for Code Models

qu,

Huang,

Chen

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

“…It supports different datasets such as ImageNet, COCO and tasks like image classification and recommendation. At the same time, the software engineering community has established some common downstream tasks (such as program translation [27]) and their corresponding metrics (such as CodeBLEU [45]) for evaluating ML models in this domain. This has facilitated the creation of benchmarks like CodeXGLUE [33] to enable fair and consistent comparison of different ML models.…”

Section: Overview Of Benchmarking ML Toolsmentioning

confidence: 99%

ModelXGlue: a benchmarking framework for ML tools in MDE

López,

Cuadrado,

Rubei

et al. 2024

Softw Syst Model

View full text Add to dashboard Cite

The integration of machine learning (ML) into model-driven engineering (MDE) holds the potential to enhance the efficiency of modelers and elevate the quality of modeling tools. However, a consensus is yet to be reached on which MDE tasks can derive substantial benefits from ML and how progress in these tasks should be measured. This paper introduces ModelXGlue , a dedicated benchmarking framework to empower researchers when constructing benchmarks for evaluating the application of ML to address MDE tasks. A benchmark is built by referencing datasets and ML models provided by other researchers, and by selecting an evaluation strategy and a set of metrics. ModelXGlue is designed with automation in mind and each component operates in an isolated execution environment (via Docker containers or Python environments), which allows the execution of approaches implemented with diverse technologies like Java, Python, R, etc. We used ModelXGlue to build reference benchmarks for three distinct MDE tasks: model classification, clustering, and feature name recommendation. To build the benchmarks we integrated existing third-party approaches in ModelXGlue . This shows that ModelXGlue is able to accommodate heterogeneous ML models, MDE tasks and different technological requirements. Moreover, we have obtained, for the first time, comparable results for these tasks. Altogether, it emerges that ModelXGlue is a valuable tool for advancing the understanding and evaluation of ML tools within the context of MDE.

show abstract

“…Code generation is an essential generation task in the field of natural language processing (NLP) and software engineering [20,24,7,8,18,35,36,16], which deals with automatically generating a piece of executable code from NL utterances. In recent years, a series of Seq2Tree models have made remarkable achievements for code generation [2,38,1,39,27,29,28,33,11,14,43,21]. Specifically, given an NL utterance input, instead of outputting a sequence of code tokens directly, the Seq2Tree model outputs a sequence of AST actions.…”

Section: Introductionmentioning

confidence: 99%

Antecedent Predictions Are More Important Than You Think: An Effective Method for Tree-Based Code Generation

Dong,

Li,

Jiang

et al. 2023

Frontiers in Artificial Intelligence and Applications

View full text Add to dashboard Cite

Code generation focuses on automatically converting natural language (NL) utterances into code snippets. Sequence-to-tree (Seq2Tree) approaches are proposed for code generation with the aim of ensuring grammatical correctness of the generated code. These approaches generate subsequent Abstract Syntax Tree (AST) nodes based on the preceding predictions of AST nodes. However, existing Seq2Tree approaches tend to treat both antecedent predictions and subsequent predictions equally, which poses a challenge for models to produce accurate subsequent predictions if the antecedent predictions are incorrect under the constraints of the AST. Given this challenge, it is necessary to pay more attention to antecedent predictions compared to subsequent predictions. To this end, this paper proposes a novel and effective method, named Antecedent Prioritized (AP) Loss, which prioritizes antecedent predictions by leveraging the position information of the generated AST nodes. We design an AST-to-Vector (AST2Vec) method that maps AST node positions to two-dimensional vectors, thereby modeling the position information of AST nodes. To evaluate the effectiveness of our proposed loss, we implement and train an Antecedent Prioritized Tree-based code generation model called APT. Experiments on four benchmark datasets demonstrate that with better antecedent predictions and accompanying subsequent predictions, APT achieves significant improvements, indicating the superiority and generality of our proposed method.

show abstract

Syntax and Domain Aware Model for Unsupervised Program Translation

Cited by 3 publications

References 32 publications

A Stealthy Backdoor Attack for Code Models

A Stealthy Backdoor Attack for Code Models

ModelXGlue: a benchmarking framework for ML tools in MDE

Antecedent Predictions Are More Important Than You Think: An Effective Method for Tree-Based Code Generation

Contact Info

Product

Resources

About