Software developers rarely write code from scratch. With the existence of Wikipedia, discussion forums, books and blogs, it is hard to imagine a software developer not looking up these sites for sample code while building any non-trivial software system. While researchers have proposed approaches to retrieve relevant posts and code snippets, the need for finding variant implementations of functionally similar code snippets has been ignored. In this work, we propose an approach to automatically create a repository of structurally heterogeneous but functionally similar source code examples from unstructured sources. We evaluate the approach on stackoverflow 1 , a discussion forum that has approximately 19 million posts. The results of our evaluation indicates that the approach extracts structurally different snippets with a precision of 83%. A repository of such heterogeneous source code examples will be useful to programmers in learning different implementation strategies and for researchers working on problems such as program comprehension, semantic clones and code search.
Code search with natural language terms performs poorly because programming concepts do not always lexically match their syntactic forms. For example, in Java, the programming concept array does not match with its syntactic representation of [ ]. Code search engines can assist developers more effectively over natural language queries if such mappings existed for a variety of programming languages. In this work, we present a programming language agnostic technique to discover such mappings between syntactic forms and natural language terms representing programming concepts. We use the questions and answers in Stack Overflow to create this mapping. We implement our approach in a tool called Anne. To evaluate its effectiveness, we conduct a user study in an academic setting in which teaching assistants use Anne to search for code snippets in student submissions. With the use of Anne, we find that the participants are 29% quicker with no significant drop in correctness and completeness.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.