Sharkzor

Pirrung, Meg; Hilliard, Nathan; O’Brien, Nancy; Yankov, A.; Corley, Court; Hodas, Nathan O.

doi:10.1145/3180308.3180337

Cited by 9 publications

(1 citation statement)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While these works provide flexibility in the definition of the target to users, the flexibility of task description is inevitably limited to classification. Some prior work proposed other forms of usercustomizable IML systems where users can register their own target objects [1], [29], [38], create their own rules for image search [19], or customize feature space for data sorting [24], [48]. However, these cases still focus on task-specific customization scenarios and users cannot fully control the task definition.…”

Section: Interactive Machine Learningmentioning

confidence: 99%

Image-to-Text Translation for Interactive Image Recognition: A Comparative User Study with Non-expert Users

Kawabe,

Sugano

2024

Journal of Information Processing

View full text Add to dashboard Cite

Interactive machine learning (IML) allows users to build their custom machine learning models without expert knowledge. While most existing IML systems are designed with classification algorithms, they sometimes oversimplify the capabilities of machine learning algorithms and restrict the user's task definition. On the other hand, as recent large-scale language models have shown, natural language representation has the potential to enable more flexible and generic task descriptions. Models that take images as input and output text have the potential to represent a variety of tasks by providing appropriate text labels for training. However, the effect of introducing text labels to IML system design has never been investigated. In this work, we aim to investigate the difference between image-to-text translation and image classification for IML systems. Using our prototype systems, we conducted a comparative user study with non-expert users, where participants solved various tasks. Our results demonstrate the underlying difficulty for users in properly defining image recognition tasks while highlighting the potential and challenges of interactive image-to-text translation systems.

show abstract