Better Together? An Evaluation of AI-Supported Code Translation

Weisz, Justin D.; Müller, Michael; Ross, Steven; Martínez, Fernando; Houde, Stephanie; Agarwal, Mayank; Talamadupula, Kartik; Richards, John T.

doi:10.1145/3490099.3511157

Cited by 38 publications

(13 citation statements)

References 80 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Empirical evaluations of this model have shown that, although the quality of its outputs is quite good, those outputs may still be problematic [57]. Echoing the results from Weisz et al [103], human-centered evaluations of Copilot have found that it increases users' feelings of productivity [109], and that almost a third (27%) of its proposed code completions were accepted by users. In a contrasting evaluation, Vaithilingam et al [95] found that while most participants expressed a preference to use Copilot in their daily work, it did not necessarily improve their task completion times or success rates.…”

Section: Code-fluent Foundation Models and Human-centered Evaluations...mentioning

confidence: 94%

“…code that is free of syntax or logical errors). Nonetheless, Weisz et al [102] found that software engineers are still interested in using such models in their work, and that the imperfect outputs of these models can even help them produce higher-quality code via human-AI collaboration [103].…”

Section: Code-fluent Foundation Models and Human-centered Evaluations...mentioning

confidence: 99%

“…We note that prior studies (e.g. [103,105,109]) conducted quantitative examinations of the use of LLMs in code work; our study is akin to Weisz et al's qualitative examination of software engineers' attitudes toward working with models that may fail to produce working code [102].…”

Section: Empirical Study Of Conversational Programming Assistancementioning

confidence: 99%

“…Our scale items were modeled from scales published in Weisz et al[103, Table 9 -AI Support] that measured constructs including ease of use (item 3), response quality (item 1), the production of higher-quality code (item 5), and the ability to write code more rapidly (item 4). We added additional items to cover the constructs of request understanding and enjoyment, and we cast all items on a 4-point scale of extent.…”

mentioning

confidence: 99%

See 3 more Smart Citations

The Programmer's Assistant: Conversational Interaction with a Large Language Model for Software Development

Ross,

Martinez,

Houde

et al. 2023

Preprint

View full text Add to dashboard Cite

Large language models (LLMs) have recently been applied in software engineering to perform tasks such as translating code between programming languages, generating code from natural language, and autocompleting code as it is being written. When used within development tools, these systems typically treat each model invocation independently from all previous invocations, and only a specific limited functionality is exposed within the user interface. This approach to user interaction misses an opportunity for users to more deeply engage with the model by having the context of their previous interactions, as well as the context of their code, inform the model's responses. We developed a prototype system -the Programmer's Assistant -in order to explore the utility of conversational interactions grounded in code, as well as software engineers' receptiveness to the idea of conversing with, rather than invoking, a code-fluent LLM. Through an evaluation with 42 participants with varied levels of programming experience, we found that our system was capable of conducting extended, multi-turn discussions, and that it enabled additional knowledge and capabilities beyond code generation to emerge from the LLM. Despite skeptical initial expectations for conversational programming assistance, participants were impressed by the breadth of the assistant's capabilities, the quality of its responses, and its potential for improving their productivity. Our work demonstrates the unique potential of conversational interactions with LLMs for co-creative processes like software development.CCS Concepts: • Human-centered computing → HCI theory, concepts and models; • Software and its engineering → Designing software; • Computing methodologies → Generative and developmental approaches.

show abstract

Section: Code-fluent Foundation Models and Human-centered Evaluations...mentioning

confidence: 94%

Section: Code-fluent Foundation Models and Human-centered Evaluations...mentioning

confidence: 99%

Section: Empirical Study Of Conversational Programming Assistancementioning

confidence: 99%

mentioning

confidence: 99%

See 2 more Smart Citations

The Programmer's Assistant: Conversational Interaction with a Large Language Model for Software Development

Ross,

Martinez,

Houde

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Factors observed in studies evaluating UX [ 21 ] include appearance, perceptions, performance, availability, and overall satisfaction. Additionally, cognitive load [ 22 ] and efficacy [ 23 ] have also been observed as factors in intelligent user interface evaluation.…”

Section: Background and Related Workmentioning

confidence: 99%

Intelligent User Interfaces and Their Evaluation: A Systematic Mapping Study

Brdnik

Heričko

Šumak

2022

Sensors

View full text Add to dashboard Cite

Intelligent user interfaces (IUI) are driven by the goal of improvement in human–computer interaction (HCI), mainly improving user interfaces’ user experience (UX) or usability with the help of artificial intelligence. The main goal of this study is to find, assess, and synthesize existing state-of-the-art work in the field of IUI with an additional focus on the evaluation of IUI. This study analyzed 211 studies published in the field between 2012 and 2022. Studies are most frequently tied to HCI and SE domains. Definitions of IUI were observed, showing that adaptation, representation, and intelligence are key characteristics associated with IUIs, whereas adaptation, reasoning, and representation are the most commonly used verbs in their description. Evaluation of IUI is mainly conducted with experiments and questionnaires, though usability and UX are not considered together in evaluations. Most evaluations (81% of studies) reported partial or complete improvement in usability or UX. A shortage of evaluation tools, methods, and metrics, tailored for IUI, is noticed. Most often, empirical data collection methods and data sources in IUI evaluation studies are experiment, prototype development, and questionnaire.

show abstract

A special delivery by a fork: Where does artificial intelligence come from?

Thornton

2023

New Drctns Evaluation

View full text Add to dashboard Cite

In this article, I discuss the use of artificial intelligence (AI) in evaluation and its relevance to the evolution of the field. I begin with a background on how AI models are developed, including how machine learning makes sense of data and how the algorithms it develops go on to power AI models. I go on to explain how this foundational understanding of machine learning and natural language processing informs where AI might and might not be effectively used. A critical concern is that AI models are only as strong as the data on which they are trained, and evaluators should consider important limitations when using AI, including its relevance to structural inequality. In considering the relationship between AI and evaluation, evaluators must consider both AI's use as an evaluative tool and its role as a new subject of evaluation. As AI becomes more and more relevant to a wider array of fields and disciplines, evaluators will need to develop strategies for how good the AI is (or is not), and what good the AI might (or might not) do.

show abstract

Better Together? An Evaluation of AI-Supported Code Translation

Cited by 38 publications

References 80 publications

The Programmer's Assistant: Conversational Interaction with a Large Language Model for Software Development

The Programmer's Assistant: Conversational Interaction with a Large Language Model for Software Development

Intelligent User Interfaces and Their Evaluation: A Systematic Mapping Study

A special delivery by a fork: Where does artificial intelligence come from?

Contact Info

Product

Resources

About