Improving the Tokenisation of Identifier Names

Butler, Simon; Wermelinger, Michel; Yu, Yijun; Sharp, Helen

doi:10.1007/978-3-642-22655-7_7

Cited by 42 publications

(41 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Text comparison is not restricted to neighboring elements as our type system and the topology analysis are. The general idea is to tokenize declaration names [11] and to calculate the importance of each substring regarding the feature's vocabulary. The vocabulary of a feature consists of all tokens in extent(f).…”

Section: Text Comparisonmentioning

confidence: 99%

See 1 more Smart Citation

Variability Mining: Consistent Semi-automatic Detection of Product-Line Features

Kästner

Dreiling

Ostermann

2014

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

Abstract-Software product line engineering is an efficient means to generate a set of tailored software products from a common implementation. However, adopting a product-line approach poses a major challenge and significant risks, since typically legacy code must be migrated toward a product line. Our aim is to lower the adoption barrier by providing semiautomatic tool support-called variability mining-to support developers in locating, documenting, and extracting implementations of product-line features from legacy code. Variability mining combines prior work on concern location, reverse engineering, and variability-aware type systems, but is tailored specifically for the use in product lines. Our work pursues three technical goals: (1) we provide a consistency indicator based on a variability-aware type system, (2) we mine features at a fine level of granularity, and (3) we exploit domain knowledge about the relationship between features when available. With a quantitative study, we demonstrate that variability mining can efficiently support developers in locating features.

show abstract

Section: Text Comparisonmentioning

confidence: 99%

“…Although not directly visible from Table 1, domain knowledge about dependencies between features can have a significant impact on the results of the mining process. 11 The influence of knowing dependencies 11. The influence of domain knowledge about mutually exclusive features conceptually also has an influence.…”

Section: Further Measuresmentioning

confidence: 99%

Variability Mining: Consistent Semi-automatic Detection of Product-Line Features

Kästner

Dreiling

Ostermann

2014

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

show abstract

“…In contrast, the lexical descriptions of feature and program elements are more ad-hoc and uncertain. It requires much effort to "normalize" in order to make these lexical descriptions useful (Butler et al, 2011).…”

Section: The Impact Of Initial Mapping Results In the First Iterationmentioning

confidence: 99%

Improving feature location using structural similarity and iterative graph mapping

Peng

Xing

Tan

et al. 2013

Journal of Systems and Software

Self Cite

View full text Add to dashboard Cite

Abstract. Locating program element(s) relevant to a particular feature is an important step in efficient maintenance of a software system. The existing feature location techniques analyze each feature independently and perform a one-time analysis after being provided an initial input. As a result, these techniques are sensitive to the quality of the input. In this paper, we propose to address the above issues in feature location using an iterative context-aware approach. The underlying intuition is that features are not independent of each other, and the structure of source code resembles the structure of features. The distinguishing characteristics of the proposed approach are: (1) it takes into account the structural similarity between a feature and a program element to determine feature-element relevance; (2) it employs an iterative process to propagate the relevance of the established mappings between a feature and a program element to the neighboring features and program elements. We evaluate our approach using two different systems, DirectBank, a small-scale industry financial system, and Linux kernel, a large-scale open-source operating system. Our evaluation suggests that the proposed approach is more robust and can significantly increase the recall of feature location with only a minor decrease of precision.

show abstract

“…In the first stage, the Java code is parsed using the source code mining tool JIM 8 (Butler et al 2010), which automates the extraction and analysis of identifiers from source files. It parses the code, extracts the identifiers and splits them into terms, using the INTT 9 tool (Butler et al 2011) within JIM. INTT uses camel case, separators and other heuristics to split at ambiguous boundaries, like digits and lower case letters.…”

Section: Data Processingmentioning

confidence: 99%

Locating bugs without looking back

Dilshener

Wermelinger

2017

Autom Softw Eng

Self Cite

View full text Add to dashboard Cite

Bug localisation is a core program comprehension task in software maintenance: given the observation of a bug, e.g. via a bug report, where is it located in the source code? Information retrieval (IR) approaches see the bug report as the query, and the source code files as the documents to be retrieved, ranked by relevance. Such approaches have the advantage of not requiring expensive static or dynamic analysis of the code. However, current state-of-the-art IR approaches rely on project history, in particular previously fixed bugs or previous versions of the source code. We present a novel approach that directly scores each current file against the given report, thus not requiring past code and reports. The scoring method is based on heuristics identified through manual inspection of a small sample of bug reports. We compare our approach to eight others, using their own five metrics on their own six open source projects. Out of 30 performance indicators, we improve 27 and equal 2. Over the projects analysed, on average we find one or more affected files in the top 10 ranked files for 76% of the bug reports. These results show the applicability of our approach to software projects without history.

show abstract

Improving the Tokenisation of Identifier Names

Cited by 42 publications

References 13 publications

Variability Mining: Consistent Semi-automatic Detection of Product-Line Features

Variability Mining: Consistent Semi-automatic Detection of Product-Line Features

Improving feature location using structural similarity and iterative graph mapping

Locating bugs without looking back

Contact Info

Product

Resources

About