Source code search is an important tool used by software engineers. However, until now relatively little is known about what developers search for in source code and why. This paper addresses this knowledge gap. We present the results of a log file analysis of a source code search engine. The data from the log file was analyzed together with the change history of four development and maintenance systems. The results show that most of the search targets were not changed after being downloaded, thus we concluded that the developers conducted searches to find reusable components, to obtain coding examples or to perform impact analysis. In contrast, maintainers often change the code they have downloaded. Moreover, we automatically categorized the search queries. The most popular categories were: method name, structural pattern, and keyword. The major search target was a statement. Although the selected data set was small, the deviations between the systems were negligible, therefore we conclude that our results are valid.
Global enterprises face an increasingly high complexity of software systems. Although size and complexity are two different aspects of a software system, traditionally, various size metrics have been established to indicate their complexity. In fact, many developed software metrics correlate with the number of lines of code. Moreover, a combination of multiple metrics collected on bottom layers into one comprehensible and meaningful indicator for an entire system is not a trivial task. This paper proposes a novel interpretation of an entropy-based metric to assess the design of a software system in terms of interface quality and understandability. The proposed metric is independent of the system size and delivers one single value eliminating the unnecessary aggregation step. Further, an industrial case study has been conducted to illustrate the usefulness of this metric.
While analyzing a log file of a text-based source code search engine we discovered that developers search for fine-grained syntactical patterns in 36% of queries. Currently, to cope with queries of this kind developers need to use regular expressions, to add redundant terms to the query or to combine searching with other tools provided by the development environment. To improve the expressiveness of the queries, these can be formulated as tree patterns of abstract syntax trees. These search patterns can be expressed by using query languages, such as XPath. However, developers usually do not work with either XPath or with AST. To shield developers from the complexity of query formulation we propose using sample code snippets as queries. The novelty of our approach is the combination of a query language that is very close to the surface programming language and a special database technology to store a large amount of abstract syntax trees. The advantage of this approach over existing source code query languages and search engines is the performance of both query formulation and query execution. This paper describes the technical details of the method and illustrates the value of this approach with performance measures and an industrial controlled experiment. All developers were able to complete the tasks of the experiment faster and more accurately by using our tool (ACS) than by using a text-based search engine. The number of false positives in the result lists was significantly decreased.
Error correction during application maintenance is a difficult activity. Finding a source code that implements certain real-world concepts is an essential part of maintenance and is called concept location. The research community has mainly addressed concept location in legacy systems. Nevertheless, the difficulty of concept location in serviceoriented software systems is as yet unknown, but thought to be seriously underestimated. This paper discusses characteristics of service-oriented enterprise software systems that complicate or support concept location. The conclusions are based on experiments with two industrial serviceoriented enterprise resource planning systems.
Enabling fast and detailed insights over large portions of source code is an important task in a global development ecosystem. Numerous data structures have been developed to store source code and to support various structural queries, to help in navigation, evaluation and analysis. Many of these data structures work with tree-based or graph-based representations of source code. The goal of this project is to elaborate a data storage that enables efficient storing and fast querying of structural information. The naive adjacency list method has been enhanced with the use of recent data compression approaches for column-oriented databases to allow no-loss albeit compact storage of fine-grained structural data. The graph indexing has enabled the proposed data model to expeditiously answer fine-grained structural queries. This paper describes the basics of the proposed approach and illustrates its technical feasibility.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.