2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
DOI: 10.1109/icassp.2018.8461568
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Signal Processing and Learning Aspects of Human-Robot Interaction for an Assistive Bathing Robot

Abstract: We explore new aspects of assistive living on smart human-robot interaction (HRI) that involve automatic recognition and online validation of speech and gestures in a natural interface, providing social features for HRI. We introduce a whole framework and resources of a real-life scenario for elderly subjects supported by an assistive bathing robot, addressing health and hygiene care issues. We contribute a new dataset and a suite of tools used for data acquisition and a state-of-the-art pipeline for multimoda… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0
2

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 20 publications
(15 citation statements)
references
References 17 publications
0
13
0
2
Order By: Relevance
“…In order to build our models we performed offline classification experiments of pre-segmented commands, based on a task-dependent grammar of 23 German spoken commands, accompishing accuracies of 76% and 68% for the two tasks. For a more detailed analysis and further results regarding the offline experiments of the audio-gestural HRI module we refer the reader to [59].…”
Section: Audio Processingmentioning
confidence: 99%
“…In order to build our models we performed offline classification experiments of pre-segmented commands, based on a task-dependent grammar of 23 German spoken commands, accompishing accuracies of 76% and 68% for the two tasks. For a more detailed analysis and further results regarding the offline experiments of the audio-gestural HRI module we refer the reader to [59].…”
Section: Audio Processingmentioning
confidence: 99%
“…Surpassing human-level performance propelled the research in applications where different modalities amongst language, vision, sensory, text play an essential role inaccurate predictions and identification [ 45 ]. Several state-of-the-art approaches in multimodal fusion employing deep learning models are proposed in the literature, such as approaches presented by F. Ramzan et al [ 24 ], A. Zlatintsi et al [ 25 ], M. Dhouib and S. Masmoudi [ 26 ], Y. D. Zhang et al [ 27 ], C. Devaguptapu et al [ 28 ], and P. Narkhede et al [ 46 ]. The purpose of these approaches is to enhance the multimodal fusion method with which the objects can be efficiently detected from static images or given video sequences with the preferable use of the deep learning library.…”
Section: Theoretical Backgroundmentioning
confidence: 99%
“…In contrast, the previous approaches [ 16 , 17 , 18 , 19 , 20 ] have preferred to use holistic techniques based on the silhouette in visible (RGB) imaging to identify the individual. (2) The proposed approach implements a YOLOv3 model-based method for fusing face and gait, which is better than other related methods used in the CNNs model [ 17 , 21 , 24 , 25 ], SVM, ANN technology [ 26 ], and YOLO model [ 27 , 28 ]. Besides, our approach seeks to increase recognition accuracy without combining visual (RGB) imaging features with thermal imaging features compared to related methods in the same night dataset.…”
Section: Introductionmentioning
confidence: 99%
“…The respective classifiers' output scores for all possible intentions were fused by a weighted linear combination with tunable weights while the intention with the highest fused score was predicted. Another work operating on classifier outputs for fusion by Zlatintsi et al [11] proposed an intention recognition system for an assistive bathing robot based on speech and gestures. They applied a late fusion scheme meaning that an intention was chosen as the detected one if it was ranked highest by the speech classifier and among the two highest ranked intentions according to observed gestures.…”
Section: Related Workmentioning
confidence: 99%