Daniel Dobos scite author profile

Programmers frequently search for source code to reuse using keyword searches. The search effectiveness in facilitating reuse, however, depends on the programmer's ability to specify a query that captures how the desired code may have been implemented. Further, the results often include many irrelevant matches that must be filtered manually. More semantic search approaches could address these limitations, yet existing approaches are either not flexible enough to find approximate matches or require the programmer to define complex specifications as queries. We propose a novel approach to semantic code search that addresses several of these limitations and is designed for queries that can be described using a concrete input/output example. In this approach, programmers write lightweight specifications as inputs and expected output examples. Unlike existing approaches to semantic search, we use an SMT solver to identify programs or program fragments in a repository, which have been automatically transformed into constraints using symbolic analysis, that match the programmer-provided specification. We instantiated and evaluated this approach in subsets of three languages, the Java String library, Yahoo! Pipes mashup language, and SQL select statements, exploring its generality, utility, and trade-offs. The results indicate that this approach is effective at finding relevant code, can be used on its own or to filter results from keyword searches to increase search precision, and is adaptable to find approximate matches and then guide modifications to match the user specifications when exact matches do not already exist. These gains in precision and flexibility come at the cost of performance, for which underlying factors and mitigation strategies are identified.

show abstract

Solving Semantic Searches for Source Code

Stolee¹,

Elbaum²,

Dobos³

2012

View full text Add to dashboard Cite

Abstract-Programmers search for code frequently utilizing syntactic queries. The effectiveness of this type of search depends on the ability of a programmer to specify a query that captures how the desired code may have been implemented, and the results often include many irrelevant matches that must be filtered manually. More semantic search approaches could address these limitations, yet the existing approaches either do not scale or require for the programmer to define complex queries. Instead, our approach to semantic search requires for the programmer to write lightweight, incomplete specifications, such as an example input and expected output of a desired function. Unlike existing approaches to semantic search, we use an SMT solver to identify programs in a repository, encoded as constraints, that match the programmer-provided specification. We instantiate the approach on subsets of the Java string library, Yahoo! Pipes mashup language, and SQL select statements, and begin to assess its effectiveness and efficiency through evaluations in each domain. I. INTRODUCTIONToday, searching for code is a regular activity for most programmers [21]. Yet, the mechanisms to support this activity have barely evolved in the last decade, and the limitations are becoming more apparent as code repositories get richer and programmers' expertise and needs more diverse.Consider a novice Java programmer who is trying to find a snippet of code that extracts an alias from an e-mail address. The programmer turns to Google (like many others [21]) and issues a search query with the following keywords: extract alias from e-mail address in Java. As expected, the query returns millions of results. None of the top ten results (a typical IR measure to assess the precision of search engine results [5]), even provide a method for decomposing an e-mail address into parts, which is the first step towards extracting the alias. Now, if the programmer is knowledgable enough about the domain to refine the query with the term substring, then the top ten results include two relevant solutions. This illustrates what occurs in practice, where programmers must sift through many irrelevant results, especially when the desired behavior cannot be tied to source code syntax or documentation.Our work targets this limitation. The general idea is that programmers provide concrete behavioral specifications as inputs and outputs and an SMT solver identifies available source code, encoded as constraints, that matches the specifications.For example, when searching for a program that extracts the alias from an e-mail address, the input could be the string "susie@mail.com" and the output the string "susie". This form of query, while more costly than a keyword query, lets the programmer specify the desired behavior, without the need to know how to achieve a certain outcome, just what that outcome is.

show abstract

A comparative study of anomaly detection methods for gross error detection problems

Dobos¹,

Nguyen²,

Dang³

et al. 2023

Computers & Chemical Engineering

View full text Add to dashboard Cite

Weighted ensemble of gross error detection methods based on particle swarm optimization

Dobos

Nguyen

McCall

et al. 2021

View full text Add to dashboard Cite

Gross errors, a kind of non-random error caused by process disturbances or leaks, can make reconciled estimates can be very inaccurate and even infeasible. Detecting gross errors thus prevents financial loss from incorrectly accounting and also identifies potential environmental consequences because of leaking. In this study, we develop an ensemble of gross error detection (GED) methods to improve the effectiveness of the gross error identification on measurement data. We propose a weighted combining method on the outputs of all constituent GED methods and then compare the combined result to a threshold to conclude about the presence of the gross error. We generate a set of measurements with or without gross error and then minimize the GED error rate of the proposed ensemble on this set with respect to the combining weights and threshold. The Particle Swarm Optimization method is used to solve this optimization problem. Experiments conducted on a simulated system show that our ensemble is better than all constituent GED methods and two ensemble methods. CCS CONCEPTS• Computing methodologies → Ensemble methods; • Mathematics of computing → Evolutionary algorithms;

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Daniel Dobos

Solving the Search for Source Code

Solving Semantic Searches for Source Code

A comparative study of anomaly detection methods for gross error detection problems

Weighted ensemble of gross error detection methods based on particle swarm optimization

Contact Info

Product

Resources

About