Combining Speech, Gaze, and Micro-gestures for the Multimodal Control of In-Car Functions

Neßelrath, Robert; Moniri, Mohammad Mehdi; Feld, Michael S.

doi:10.1109/ie.2016.42

Cited by 42 publications

(21 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Furthermore, Roider et al [28] and Nesselrath et al [21] studied the selection of objects inside the vehicle using hand gestures, eye gaze or speech commands separately. Similarly, Poitschke et al [25] studied referencing objects inside the vehicle using eye gaze gestures while Sezgin et al [31] studied selection using speech commands and facial recognition.…”

Section: Related Workmentioning

confidence: 99%

“…There are several advantages to using hand gestures, eye gaze, head movements and speech over traditional touch-based interaction methods, such as increased simplicity and naturalness when interacting with a relatively complicated machine like a modern car, in addition to a reduction in distraction during the primary task (i.e. driving) [7,21,24,28]. Thus, researchers have tried to incorporate these modalities to control various components inside the vehicle [18,19,23,27,31,38].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Studying Person-Specific Pointing and Gaze Behavior for Multimodal Referencing of Outside Objects from a Moving Vehicle

Gomaa

Reyes

Alles

et al. 2020

Proceedings of the 2020 International Conference on Multimodal Interaction

Self Cite

View full text Add to dashboard Cite

Hand pointing and eye gaze have been extensively investigated in automotive applications for object selection and referencing. Despite significant advances, existing outside-the-vehicle referencing methods consider these modalities separately. Moreover, existing multimodal referencing methods focus on a static situation, whereas the situation in a moving vehicle is highly dynamic and subject to safety-critical constraints. In this paper, we investigate the specific characteristics of each modality and the interaction between them when used in the task of referencing outside objects (e.g. buildings) from the vehicle. We furthermore explore person-specific differences in this interaction by analyzing individuals' performance for pointing and gaze patterns, along with their effect on the driving task. Our statistical analysis shows significant differences in individual behaviour based on object's location (i.e. driver's right side vs. left side), object's surroundings, driving mode (i.e. autonomous vs. normal driving) as well as pointing and gaze duration, laying the foundation for a user-adaptive approach. CCS CONCEPTS • Human-centered computing → User studies; Pointing; Gestural input; HCI theory, concepts and models.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Studying Person-Specific Pointing and Gaze Behavior for Multimodal Referencing of Outside Objects from a Moving Vehicle

Gomaa

Reyes

Alles

et al. 2020

Proceedings of the 2020 International Conference on Multimodal Interaction

Self Cite

View full text Add to dashboard Cite

show abstract

“…Lastly, researchers design multimodal interfaces by combining single modalities such as speech, gaze, and gesture for one command (e.g., gazing at an object and gesturing toward it to select) [36], rather than applying them to cascading steps. These studies focus on the synergy effect of single modalities in interaction, rather than on reducing overall driver distraction [26].…”

Section: E Discussionmentioning

confidence: 99%

A Cascaded Multimodal Natural User Interface to Reduce Driver Distraction

et al. 2020

View full text Add to dashboard Cite

Natural user interfaces (NUI) have been used to reduce driver distraction while using in-vehicle infotainment systems (IVIS), and multimodal interfaces have been applied to compensate for the shortcomings of a single modality in NUIs. These multimodal NUIs have variable effects on different types of driver distraction and on different stages of drivers' secondary tasks. However, current studies provide a limited understanding of NUIs. The design of multimodal NUIs is typically based on evaluation of the strengths of a single modality. Furthermore, studies of multimodal NUIs are not based on equivalent comparison conditions. To address this gap, we compared five single modalities commonly used for NUIs (touch, mid-air gesture, speech, gaze, and physical buttons located in a steering wheel) during a lane change task (LCT) to provide a more holistic view of driver distraction. Our findings suggest that the best approach is a combined cascaded multimodal interface that accounts for the characteristics of a single modality. We compared several combinations of cascaded multimodalities by considering the characteristics of each modality in the sequential phase of the command input process. Our results show that the combinations speech + button, speech + touch, and gaze + button represent the best cascaded multimodal interfaces to reduce driver distraction for IVIS. INDEX TERMS Cascaded multimodal interface, driver distraction, head-up display (HUD), human-computer interaction (HCI), in-vehicle infotainment system (IVIS), learning effect, natural user interface (NUI).

show abstract

“…MMI can support education for disabled people using gestures and sound [13]. Using MMI in a car, the driver may choose his/her preferred modality from speech, gaze, and gestures, and can combine the respective system input with different modalities [14]. Another MMI system connected to the steering wheel of a car enables input via speech and gestures [15].…”

Section: Examples Of Multimodal Interaction Prototypesmentioning

confidence: 99%

Faster Command Input Using the Multimodal Controller Working Position “TriControl”

et al. 2018

View full text Add to dashboard Cite

TriControl is a controller working position (CWP) prototype developed by German Aerospace Center (DLR) to enable more natural, efficient, and faster command inputs. The prototype integrates three input modalities: speech recognition, eye tracking, and multi-touch sensing. Air traffic controllers may use all three modalities simultaneously to build commands that will be forwarded to the pilot and to the air traffic management (ATM) system. This paper evaluates possible speed improvements of TriControl compared to conventional systems involving voice transmission and manual data entry. 26 air traffic controllers participated in one of two air traffic control simulation sub-studies, one with each input system. Results show potential of a 15% speed gain for multimodal controller command input in contrast to conventional inputs. Thus, the use and combination of modern human machine interface (HMI) technologies at the CWP can increase controller productivity.

show abstract

Combining Speech, Gaze, and Micro-gestures for the Multimodal Control of In-Car Functions

Cited by 42 publications

References 10 publications

Studying Person-Specific Pointing and Gaze Behavior for Multimodal Referencing of Outside Objects from a Moving Vehicle

Studying Person-Specific Pointing and Gaze Behavior for Multimodal Referencing of Outside Objects from a Moving Vehicle

A Cascaded Multimodal Natural User Interface to Reduce Driver Distraction

Faster Command Input Using the Multimodal Controller Working Position “TriControl”

Contact Info

Product

Resources

About