“…As a first example, data sets cover the sensation during human-environment interaction by measuring (mostly adult) humans directly during performing specific tasks, such as the KIT Motion-Language set for descriptions of whole-body poses (Plappert et al, 2016 ), the Multimodal-HHRI set for personality characterization (Celiktutan et al, 2017 ), and the EASE set for precise motion capturing (Meier et al, 2018 ). Secondly, data sets mimic the human perspective by holding objects in front of a perception device, such as a camera, to capture the diverse and complex but general characteristics of an environment setting, e.g., Core50 (Lomonaco and Maltoni, 2017 ), EMMI (Wang et al, 2017 ), and HOD-40 (Sun et al, 2018 ). And thirdly, humanoid robots are employed for establishing a data set, where multiple modalities are recorded in covering human-like action, i.e., including sensorimotor information, such as the MOD165 set (Nakamura and Nagai, 2017 ) and the Multimodal-HRI set (Azagra et al, 2017 ), or where multiple modalities are gathered from both robot and human in turn-table actions, like in the HARMONIC data set (Newman et al, 2018 ).…”