Augmenting user interfaces with adaptive speech commands

is is a shirt" "Change the color of the shirt" a b c Figure 1. With PIXELTONE, users speak to edit their images instead of hunting through menus. a) The user selects the person's shirt and says "This is a shirt." PIXELTONE associates the tag "shirt" with the selected region. b) The user tells PIXELTONE to "Change the color of the shirt," and c) PIXELTONE applies a hue adjustment to the image and offers a slider so that the user can explore different colors. ABSTRACTPhoto editing can be a challenging task, and it becomes even more difficult on the small, portable screens of mobile devices that are now frequently used to capture and edit images. To address this problem we present PIXELTONE, a multimodal photo editing interface that combines speech and direct manipulation. We observe existing image editing practices and derive a set of principles that guide our design. In particular, we use natural language for expressing desired changes to an image, and sketching to localize these changes to specific regions. To support the language commonly used in photoediting we develop a customized natural language interpreter that maps user phrases to specific image processing operations. Finally, we perform a user study that evaluates and demonstrates the effectiveness of our interface.

show abstract

“…These high level commands are manually created and PIXELTONE may be better served by learning from examples [10] or mining online tutorials.…”

Section: Discussionmentioning

confidence: 99%

“…Since these templates are based on grammatical structures of sentences, it is possible to semi-automate the process of defining templates by analyzing grammatical patterns on large corpora of users' utterances. Alternatively, users can train the system directly by verbalizing what they are doing as they are doing [10].…”

Section: Scalabilitymentioning

confidence: 99%

PixelTone

Laput

Dontcheva

Wilensky

et al. 2013

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

View full text Add to dashboard Cite

show abstract

“…We are also continuing work in applying our results from grounded language systems to multimodal interface design (Gorniak & Roy, 2003). We recently demonstrated an application of the Bishop system as described in this paper to the problem of referent resolution in the graphical user interface for a 3D modelling application, Blender (Blender Foundation , 2003).…”

Section: Lexical Entriesmentioning

confidence: 90%

Grounded Semantic Composition for Visual Scenes

Gorniak¹,

Roy²

2004

jair

View full text Add to dashboard Cite

We present a visually-grounded language understanding model based on a study of how people verbally describe objects in scenes. The emphasis of the model is on the combination of individual word meanings to produce meanings for complex referring expressions. The model has been implemented, and it is able to understand a broad range of spatial referring expressions. We describe our implementation of word level visually-grounded semantics and their embedding in a compositional parsing framework. The implemented system selects the correct referents in response to natural language expressions for a large percentage of test cases. In an analysis of the system's successes and failures we reveal how visual context influences the semantics of utterances and propose future extensions to the model that take such context into account

show abstract

“…Speech-based user interfaces are considered a solid foundation when handling natural interfaces regarding people with some sort of disability: It permits the input of information without the resort to a keyboard or even in the event of the inexistence of a monitor; it facilitates tasks where hands and/or eyes of the users are busy; and it relieves the need for writing for people with motor or intellectual disabilities [34]. Some systems have been developed that take advantage of speech interfaces in order to replace quick commands given by the mouse or keyboard in graphical interfaces, with very positive results [12,20].…”

Section: Related Workmentioning

confidence: 99%

Usability evaluation of navigation tasks by people with intellectual disabilities: a Google and SAPO comparative study regarding different interaction modalities

Rocha

Carvalho

Bessa

et al. 2016

Univ Access Inf Soc

View full text Add to dashboard Cite

This paper presents a case study regarding the usability evaluation of navigation tasks by people with intellectual disabilities. The aim was to investigate the factors affecting usability, by comparing their user-Web interactions and underline the difficulties observed. For that purpose, two distinct study phases were performed: the first consisted in comparing interaction using two different search engines' layouts (Google and SAPO) and the second phase consisted in a preliminary evaluation to analyze how users performed the tasks with the usual input devices (keyboard and mouse) and provide an alternative interface to help overcome possible interaction problems and enhance autonomy. For the latter, we compared two different interfaces: a WIMP-based one and speech-based one. The main results obtained showed that users had a better performance with Google (with a simpler layout) than with SAPO (with a complex layout), and despite displaying a good keyboard handling ability, they did not show autonomy using this input device (due to the need for reading/ writing when handling this device). In this perspective, Google's speech recognition application could indeed be considered an alternative for interaction. However, we found that the speech recognition interface is not as robust as it should be: it could be more precise and less prone to errors due to poor word pronunciation. After this two-phased study, we think we may be able to infer some recommendations to be used by developers in order to create more intuitive layouts for easy navigation regarding this group of people, and thereby facilitate digital inclusion.

show abstract

Augmenting user interfaces with adaptive speech commands

Cited by 13 publications

References 8 publications

PixelTone

PixelTone

Grounded Semantic Composition for Visual Scenes

Usability evaluation of navigation tasks by people with intellectual disabilities: a Google and SAPO comparative study regarding different interaction modalities

Contact Info

Product

Resources

About