How Well Do Search Engines Support Code Retrieval on the Web?

Sim, Susan Elliott; Umarji, Medha; Ratanotayanon, Sukanya; Lopes, Cristina Videira

doi:10.1145/2063239.2063243

Cited by 84 publications

(79 citation statements)

References 57 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…We stopped mining a little over 10M classes because of a noticeable delay in processing queries on our servers (unacceptable when users expect queries to be processed in a few seconds). However, we also thought 10M classes allowed us to closely approximate an Internet scale code search engine when compared against the sizes of the Internet code search engines Koders (600K), Krugle (3.5M), and Google Code Search (2.5M) [108].…”

Section: Instantiating the Index For Experimentsmentioning

confidence: 99%

“…Given the advantages and disadvantages of search engines today and the importance of search in developing software (programmers report searching for code frequently as part of their practice [97], [108]), software engineering researchers are investigating how to improve code search engines. Some, for instance, have been investigating how to support more expressive queries (e.g., searching by test case or method signatures) that afford more precise matching of code compared to keywords (e.g., [4], [15], [54], [65], [71], [82], [91], [110], [122], [142]).…”

mentioning

confidence: 99%

See 1 more Smart Citation

Understanding the impact of support for iteration on code search

Martie

Hoek

Kwak

2017

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

View full text Add to dashboard Cite

Section: Instantiating the Index For Experimentsmentioning

confidence: 99%

mentioning

confidence: 99%

Understanding the impact of support for iteration on code search

Martie

Hoek

Kwak

2017

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

View full text Add to dashboard Cite

“…Recent studies have revealed that programmers typically use general search engines to find code for reuse [21]. More specialized code search engines (e.g., Koders, Krugle, ohloh) incorporate various filtering capabilities (e.g., language, domain, scores) and program syntax into the query to better guide the matching process [21]. Other approaches add natural language processing to increase the potential matches [8], [16].…”

Section: Related Workmentioning

confidence: 99%

“…INTRODUCTION Today, searching for code is a regular activity for most programmers [21]. Yet, the mechanisms to support this activity have barely evolved in the last decade, and the limitations are becoming more apparent as code repositories get richer and programmers' expertise and needs more diverse.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Solving Semantic Searches for Source Code

Stolee¹,

Elbaum²,

Dobos³

2012

View full text Add to dashboard Cite

Abstract-Programmers search for code frequently utilizing syntactic queries. The effectiveness of this type of search depends on the ability of a programmer to specify a query that captures how the desired code may have been implemented, and the results often include many irrelevant matches that must be filtered manually. More semantic search approaches could address these limitations, yet the existing approaches either do not scale or require for the programmer to define complex queries. Instead, our approach to semantic search requires for the programmer to write lightweight, incomplete specifications, such as an example input and expected output of a desired function. Unlike existing approaches to semantic search, we use an SMT solver to identify programs in a repository, encoded as constraints, that match the programmer-provided specification. We instantiate the approach on subsets of the Java string library, Yahoo! Pipes mashup language, and SQL select statements, and begin to assess its effectiveness and efficiency through evaluations in each domain. I. INTRODUCTIONToday, searching for code is a regular activity for most programmers [21]. Yet, the mechanisms to support this activity have barely evolved in the last decade, and the limitations are becoming more apparent as code repositories get richer and programmers' expertise and needs more diverse.Consider a novice Java programmer who is trying to find a snippet of code that extracts an alias from an e-mail address. The programmer turns to Google (like many others [21]) and issues a search query with the following keywords: extract alias from e-mail address in Java. As expected, the query returns millions of results. None of the top ten results (a typical IR measure to assess the precision of search engine results [5]), even provide a method for decomposing an e-mail address into parts, which is the first step towards extracting the alias. Now, if the programmer is knowledgable enough about the domain to refine the query with the term substring, then the top ten results include two relevant solutions. This illustrates what occurs in practice, where programmers must sift through many irrelevant results, especially when the desired behavior cannot be tied to source code syntax or documentation.Our work targets this limitation. The general idea is that programmers provide concrete behavioral specifications as inputs and outputs and an SMT solver identifies available source code, encoded as constraints, that matches the specifications.For example, when searching for a program that extracts the alias from an e-mail address, the input could be the string "susie@mail.com" and the output the string "susie". This form of query, while more costly than a keyword query, lets the programmer specify the desired behavior, without the need to know how to achieve a certain outcome, just what that outcome is.

show abstract

Searching crowd knowledge to recommend solutions for API usage tasks

Campos

Souza

Maia

2016

J Software Evolu Process

View full text Add to dashboard Cite

Stack Overflow (SO) is a question and answer service directed to issues related to software development. In SO, developers post questions related to a programming topic and other members of the site can provide answers to help them. The information available on this type of service is also known as 'crowd knowledge' and currently is one important trend in supporting activities related to software development.We present an approach that makes use of 'crowd knowledge' in SO to recommend information that can assist developer activities. This strategy recommends a ranked list of question-answer pairs from SO based on a query. The criteria for ranking are based on three main aspects: the textual similarity of the pairs with respect to the query related to the developer's problem, the quality of the pairs, and a filtering mechanism that considers only 'how-to' posts. We conducted an experiment considering programming problems on three different topics (Swing, Boost and LINQ) widely used by the software development community to evaluate the proposed recommendation strategy. The results have shown that for Lucene + Score + How-to approach, 77.14% of the assessed activities have at least one recommended pair proved to be useful concerning the target programming problem.

show abstract

How Well Do Search Engines Support Code Retrieval on the Web?

Cited by 84 publications

References 57 publications

Understanding the impact of support for iteration on code search

Understanding the impact of support for iteration on code search

Solving Semantic Searches for Source Code

Searching crowd knowledge to recommend solutions for API usage tasks

Contact Info

Product

Resources

About