Text retrieval-based feature location techniques (FLTs) use information from the terms present in documents in classes and methods. However, relevant terms originating from certain locations (eg, method names) often comprise only a small part of the entire method lexicon. Feature location techniques should benefit from techniques that make greater use of this information. The primary objective of this study was to investigate how weighting terms from different locations in source code can improve a latent Dirichlet allocation (LDA)-based FLT. We conducted an empirical study of 4 subject software systems and 372 features. For each subject system, we trained 1024 different LDA models with new weighting schemes applied to leading comments, method names, parameters, body comments, and local variables. We conducted both a quantitative and qualitative analysis to identify the effects of using the weighting schemes on the performance of the LDA-based FLT. We evaluated weighting schemes based on mean reciprocal rank and spread of effectiveness measures. In addition, we conducted a factorial analysis to identify which locations have a main impact on the results of the FLT. We then examined the effects of adding information from class comments, class names, and fields to the top 10 configurations for each system. This results in an additional 640 different LDA models for each system. From our results, we identified a significant effect in the performance of an LDA-based weighting configuration when applying our weighting schemes to the LDA-based FLT. Furthermore, we found that adding information from each method's containing class can improve the effectiveness of an LDA-based FLT. Finally, we identified a set of recommendations for identifying better weighting schemes for LDA.
KEYWORDSfeature location, program comprehension, static analysis, term weighting, text retrieval
INTRODUCTIONSoftware features are functionalities that are accessible to developers and users. During the evolution of a software system, developers change the source code to add new features, enhance existing features, and remove defective features (bugs). When developers are tasked with changing the source code of a large or unfamiliar system, they must spend considerable time and effort on program comprehension activities to gain the knowledge needed to implement changes. Part of this process is called feature location, a software evolution task in which a developer locates the source code entities (eg, methods and classes) that implement a functionality (feature). 1 During software evolution, developers must perform feature location to identify the entities needed to add new functionalities, modify existing functionalities, or remove defects.Given the scale of modern software systems, manual feature location is impractical. built from one of these techniques does not contain information regarding the originating element of a term (eg, method name, local variable, and parameter), but rather contains only the (preprocessed) terms. The impacts o...