Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis 2023
DOI: 10.1145/3597926.3598035
|View full text |Cite
|
Sign up to set email alerts
|

CONCORD: Clone-Aware Contrastive Learning for Source Code

Abstract: Deep Learning (DL) models to analyze source code have shown immense promise during the past few years. More recently, selfsupervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks, such as clone and bug detection.While previous work successfully learned from different code abstractions (e.g., token, AST, graph), we argue that it is also essential to factor in how developers code day-to-day for learning generalpurpose representation. On the one h… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 66 publications
0
1
0
Order By: Relevance
“…To train a detection model robust to random perturbations, we borrow the contrastive learning technique to learn better feature representations. Despite the similarity in terms of the high-level design idea [4,23,34,46], i.e., pre-training a self-supervised featureacquisition model over a large unlabeled code database, and performing supervised fine-tuning over labeled dataset to transfer it to a specific downstream SE task, we employ an additional supervised contrastive loss term to effectively leverage label information.…”
Section: Combinatorial Contrastive Learningmentioning
confidence: 99%
“…To train a detection model robust to random perturbations, we borrow the contrastive learning technique to learn better feature representations. Despite the similarity in terms of the high-level design idea [4,23,34,46], i.e., pre-training a self-supervised featureacquisition model over a large unlabeled code database, and performing supervised fine-tuning over labeled dataset to transfer it to a specific downstream SE task, we employ an additional supervised contrastive loss term to effectively leverage label information.…”
Section: Combinatorial Contrastive Learningmentioning
confidence: 99%