2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2018
DOI: 10.1109/iros.2018.8593838
|View full text |Cite
|
Sign up to set email alerts
|

Deep Neural Object Analysis by Interactive Auditory Exploration with a Humanoid Robot

Abstract: We present a novel approach for interactive auditory object analysis with a humanoid robot. The robot elicits sensory information by physically shaking visually indistinguishable plastic capsules. It gathers the resulting audio signals from microphones that are embedded into the robotic ears. A neural network architecture learns from these signals to analyze properties of the contents of the containers. Specifically, we evaluate the material classification and weight prediction accuracy and demonstrate that th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 17 publications
(8 citation statements)
references
References 18 publications
0
8
0
Order By: Relevance
“…Recent techniques in visual question answering [49] and language grounding [55,32] allow robots to answer questions about objects and describe objects with natural language. Haptic [45,40] and auditory data [20,25] have also helped robots interpret salient features of objects beyond vision. Interactive perception can further leverage a robot's exploratory actions to reveal sensory signals that are otherwise not observable [9,56,14,61,2,8,59].…”
Section: A Semantic Reasoning In Roboticsmentioning
confidence: 99%
“…Recent techniques in visual question answering [49] and language grounding [55,32] allow robots to answer questions about objects and describe objects with natural language. Haptic [45,40] and auditory data [20,25] have also helped robots interpret salient features of objects beyond vision. Interactive perception can further leverage a robot's exploratory actions to reveal sensory signals that are otherwise not observable [9,56,14,61,2,8,59].…”
Section: A Semantic Reasoning In Roboticsmentioning
confidence: 99%
“…Vision-based recognition of an object is the commonly adopted approach; however, several research studies show incorporating a variety of sensory modalities is the key to further enhance the robotic capabilities in recognizing the multisensory object properties (see Bohg et al, 2017 ; Li et al, 2020 for a review). Previous work has shown that robots can recognize objects using non-visual sensory modalities such as the auditory (Torres-Jara et al, 2005 ; Sinapov et al, 2009 ; Luo et al, 2017 ; Eppe et al, 2018 ; Jin et al, 2019 ; Gandhi et al, 2020 ), the tactile (Sinapov et al, 2011b ; Bhattacharjee et al, 2012 ; Fishel and Loeb, 2012 ; Kerzel et al, 2019 ), and the haptic sensory modalities (Natale et al, 2004 ; Bergquist et al, 2009 ; Braud et al, 2020 ). In addition to recognizing objects, multisensory feedback has also proven useful for learning object categories (Sinapov et al, 2014a ; Högman et al, 2016 ; Taniguchi et al, 2018 ; Tatiya and Sinapov, 2019 ), material properties (Erickson et al, 2017 , 2019 ; EguĂ­luz et al, 2018 ), object relations (Sinapov et al, 2014b , 2016 ), and more generally, grounding linguistic descriptors (e.g., nouns and adjectives) that humans use to describe objects (Thomason et al, 2016 ; Richardson and Kuchenbecker, 2019 ; Arkin et al, 2020 ).…”
Section: Related Workmentioning
confidence: 99%
“…And it is proved that the sound of shaking can be used for object recognition in many places such as shopping malls, workshops and home. Eppe et al (2018) used a humanoid robot to perform auditory exploration of a group of visually indistinguishable plastic containers filled with different amounts of different materials, proving that deep recursive neural structures can learn to distinguish individual materials and estimate their weight.…”
Section: Related Workmentioning
confidence: 99%
“…The feature extraction method can also effectively reduce the dimension, thus reducing the computational cost. Related studies have successfully applied the Mel-Frequency Cepstral Coefficients (MFCCs) to speech feature extraction and object recognition, as in the literature (Nakamura et al, 2013; Luo et al, 2017; Eppe et al, 2018). The standard MFCC feature can only propose the static characteristics of the sound (Cao et al, 2017).…”
Section: Acoustic Dataset Collectionmentioning
confidence: 99%