2011
DOI: 10.1007/978-3-642-22655-7_7
|View full text |Cite
|
Sign up to set email alerts
|

Improving the Tokenisation of Identifier Names

Abstract: Abstract. Identifier names are the main vehicle for semantic information during program comprehension. Identifier names are tokenised into their semantic constituents by tools supporting program comprehension tasks, including concept location and requirements traceability. We present an approach to the automated tokenisation of identifier names that improves on existing techniques in two ways. First, it improves tokenisation accuracy for identifier names of a single case and those containing digits. Second, pe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
41
0

Year Published

2013
2013
2017
2017

Publication Types

Select...
7
1

Relationship

3
5

Authors

Journals

citations
Cited by 42 publications
(41 citation statements)
references
References 13 publications
0
41
0
Order By: Relevance
“…Text comparison is not restricted to neighboring elements as our type system and the topology analysis are. The general idea is to tokenize declaration names [11] and to calculate the importance of each substring regarding the feature's vocabulary. The vocabulary of a feature consists of all tokens in extent(f).…”
Section: Text Comparisonmentioning
confidence: 99%
See 1 more Smart Citation
“…Text comparison is not restricted to neighboring elements as our type system and the topology analysis are. The general idea is to tokenize declaration names [11] and to calculate the importance of each substring regarding the feature's vocabulary. The vocabulary of a feature consists of all tokens in extent(f).…”
Section: Text Comparisonmentioning
confidence: 99%
“…Although not directly visible from Table 1, domain knowledge about dependencies between features can have a significant impact on the results of the mining process. 11 The influence of knowing dependencies 11. The influence of domain knowledge about mutually exclusive features conceptually also has an influence.…”
Section: Further Measuresmentioning
confidence: 99%
“…In contrast, the lexical descriptions of feature and program elements are more ad-hoc and uncertain. It requires much effort to "normalize" in order to make these lexical descriptions useful (Butler et al, 2011).…”
Section: The Impact Of Initial Mapping Results In the First Iterationmentioning
confidence: 99%
“…In the first stage, the Java code is parsed using the source code mining tool JIM 8 (Butler et al 2010), which automates the extraction and analysis of identifiers from source files. It parses the code, extracts the identifiers and splits them into terms, using the INTT 9 tool (Butler et al 2011) within JIM. INTT uses camel case, separators and other heuristics to split at ambiguous boundaries, like digits and lower case letters.…”
Section: Data Processingmentioning
confidence: 99%