2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) 2021
DOI: 10.1109/icse43902.2021.00059
|View full text |Cite
|
Sign up to set email alerts
|

IdBench: Evaluating Semantic Representations of Identifier Names in Source Code

Abstract: Identifier names convey useful information about the intended semantics of code. Name-based program analyses use this information, e.g., to detect bugs, to predict types, and to improve the readability of code. At the core of namebased analyses are semantic representations of identifiers, e.g., in the form of learned embeddings. The high-level goal of such a representation is to encode whether two identifiers, e.g., le n and s i z e , are semantically similar. Unfortunately, it is currently unclear to what ext… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 17 publications
(4 citation statements)
references
References 51 publications
0
4
0
Order By: Relevance
“…Relevance or Similarity: Several studies define relevance as to how relevant is the model's output to the reference text or code [2,5,11,16]. Others asked developers to rate the similarity, relatedness, and contextual or semantic similarity between outputs and reference texts [9,10,15].…”
Section: Evaluation Of Nlp-based Modelsmentioning
confidence: 99%
“…Relevance or Similarity: Several studies define relevance as to how relevant is the model's output to the reference text or code [2,5,11,16]. Others asked developers to rate the similarity, relatedness, and contextual or semantic similarity between outputs and reference texts [9,10,15].…”
Section: Evaluation Of Nlp-based Modelsmentioning
confidence: 99%
“…To this end, the approach represents names and values as vectors that preserve their meaning. To represent identifier names, we build on learned token embeddings [13], which map each name into a vector while preserving the semantic similarities of names [54]. For example, the vector of probability will be close to the vectors of names probab and likelihood, because these names refer to similar concepts.…”
Section: Overviewmentioning
confidence: 99%
“…We build upon FastText [13], a neural word embedding known to represent the semantics of identifiers more accurately than other popular embeddings [54]. An additional key benefit of FastText is to avoid the out-of-vocabulary problem that other embeddings, e.g., Word2vec [36] suffer from, by splitting each token into n-grams and by computing a separate vector representation for each n-gram.…”
Section: Representation As Vectorsmentioning
confidence: 99%
“…As stated by Host and Ostvold (2007), even though naming is part of daily life for programmers, it entails a great deal of time and thought: names should convey to others the purpose of the code (Martin, 2008) and reflect the meaning of domain concepts (Marcus et al, 2004). Meaningful identifier names are key to bridging the gap between intention and implementation (Wainakh et al, 2021). Therefore, given that poorly chosen identifier names might hinder source code comprehension (Schankin et al, 2018), using meaningful identifier names is a recommended practice present in several coding style guides and conventions.…”
Section: Introductionmentioning
confidence: 99%