Proceedings of the IEEE/ACM 46th International Conference on Software Engineering 2024
DOI: 10.1145/3597503.3639133
|View full text |Cite
|
Sign up to set email alerts
|

Traces of Memorisation in Large Language Models for Code

Ali Al-Kaswan,
Maliheh Izadi,
Arie van Deursen

Abstract: Large language models have gained significant popularity because of their ability to generate human-like text and potential applications in various fields, such as Software Engineering. Large language models for code are commonly trained on large unsanitised corpora of source code scraped from the internet. The content of these datasets is memorised and can be extracted by attackers with data extraction attacks. In this work, we explore memorisation in large language models for code and compare the rate of mem… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
references
References 35 publications
0
0
0
Order By: Relevance