2015
DOI: 10.1002/smr.1710
|View full text |Cite
|
Sign up to set email alerts
|

The impact of vocabulary normalization

Abstract: Software development, evolution, and maintenance depend on ever increasing tool support. Recent tools have incorporated increasing analysis of the natural language found in source code, predominately in the identifiers and comments. However, when coders combine abbreviations and acronyms to form multiword identifiers, they, in essence, invent new vocabulary making the source code's vocabulary differ from that of other software artifacts. This vocabulary mismatch is a potential problem for many techniques impor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 38 publications
0
4
0
Order By: Relevance
“…Considerable attention has been devoted to the issue of name lengths, and in particular to whether abbreviations cause a disadvantage relative to using full words [5], [15], [16], [17], [19], [20]. Another favorite topic is controlling the vocabulary used in variable names, and how this reduces ambiguity [4], [7], [9], [10] Feitelson et al investigated how variables names are selected [11]. One of their results was that naming is very variable, with little chance that different developers would select exactly the same name for the same variable.…”
Section: Related Workmentioning
confidence: 99%
“…Considerable attention has been devoted to the issue of name lengths, and in particular to whether abbreviations cause a disadvantage relative to using full words [5], [15], [16], [17], [19], [20]. Another favorite topic is controlling the vocabulary used in variable names, and how this reduces ambiguity [4], [7], [9], [10] Feitelson et al investigated how variables names are selected [11]. One of their results was that naming is very variable, with little chance that different developers would select exactly the same name for the same variable.…”
Section: Related Workmentioning
confidence: 99%
“…Here, the attempted hit [35] or vocabulary match [54] is performed, by matching the phrase within the query to the same phrase occurring within the text of the document at least once. Note that various measurement techniques suggested in the literature are not made use of, for example: inverse document frequency (idf) and the product of term frequency (tf) and idf (tf*idf) [55][56][57], cosine similarity [10,58] and others. These measurement techniques were not used because the results needed to be non-weighted (based on pure 'hits', where tf = 1) and uninfluenced by these and other mathematical techniques.…”
Section: The Irsmentioning
confidence: 99%
“…Closer to the bias that we analyze is the work of Dit et al and Binkley et al on the impact of various kinds of preprocessing operations of textual data on the effectiveness of feature location and traceability recovery from source code. Both papers compared the impact on feature location algorithms of using different approaches for splitting identifiers.…”
Section: Related Workmentioning
confidence: 99%