“…Vision-based recognition of an object is the commonly adopted approach; however, several research studies show incorporating a variety of sensory modalities is the key to further enhance the robotic capabilities in recognizing the multisensory object properties (see Bohg et al, 2017 ; Li et al, 2020 for a review). Previous work has shown that robots can recognize objects using non-visual sensory modalities such as the auditory (Torres-Jara et al, 2005 ; Sinapov et al, 2009 ; Luo et al, 2017 ; Eppe et al, 2018 ; Jin et al, 2019 ; Gandhi et al, 2020 ), the tactile (Sinapov et al, 2011b ; Bhattacharjee et al, 2012 ; Fishel and Loeb, 2012 ; Kerzel et al, 2019 ), and the haptic sensory modalities (Natale et al, 2004 ; Bergquist et al, 2009 ; Braud et al, 2020 ). In addition to recognizing objects, multisensory feedback has also proven useful for learning object categories (Sinapov et al, 2014a ; Högman et al, 2016 ; Taniguchi et al, 2018 ; Tatiya and Sinapov, 2019 ), material properties (Erickson et al, 2017 , 2019 ; EguĂluz et al, 2018 ), object relations (Sinapov et al, 2014b , 2016 ), and more generally, grounding linguistic descriptors (e.g., nouns and adjectives) that humans use to describe objects (Thomason et al, 2016 ; Richardson and Kuchenbecker, 2019 ; Arkin et al, 2020 ).…”