Proceedings of the 2014 ACM/IEEE International Conference on Human-Robot Interaction 2014
DOI: 10.1145/2559636.2563706
|View full text |Cite
|
Sign up to set email alerts
|

Speaker identification using three signal voice domains during human-robot interaction

Abstract: This LBR describes a novel method for user recognition in HRI, based on analyzing the peculiarities of users voices, and specially focused at being used in a robotic system. The method is inspired by acoustic fingerprinting techniques, and is made of two phases: a)enrollment in the system: the features of the user's voice are stored in files called voiceprints, b)searching phase: the features extracted in real time are compared with the voiceprints using a pattern matching method to obtain the most likely user… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0
1

Year Published

2015
2015
2022
2022

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(9 citation statements)
references
References 6 publications
0
8
0
1
Order By: Relevance
“…This desktop robot has been employed in stimulation sessions for mild cognitive impaired elderly people [34]. The platform offers multiple interaction interfaces such as automatic speech recognition [35], voice activity detection [36], user recognition [37], user localization [38], user identification [39], emotion detection [40], and TTS capabilities as well as a 3D camera. This device was a Kinect for Xbox One with a colour resolution of 1920 × 1080 pixels at 30 frames per second (limited in our study to 10 fps) and a depth resolution of 512 × 424 points at the same frame rate.…”
Section: Experiments Descriptionmentioning
confidence: 99%
“…This desktop robot has been employed in stimulation sessions for mild cognitive impaired elderly people [34]. The platform offers multiple interaction interfaces such as automatic speech recognition [35], voice activity detection [36], user recognition [37], user localization [38], user identification [39], emotion detection [40], and TTS capabilities as well as a 3D camera. This device was a Kinect for Xbox One with a colour resolution of 1920 × 1080 pixels at 30 frames per second (limited in our study to 10 fps) and a depth resolution of 512 × 424 points at the same frame rate.…”
Section: Experiments Descriptionmentioning
confidence: 99%
“…The echoes of the voice signals due to the walls and other objects affect its reliability. Alonso et al perceived humans' utterances and analyzed them using signal processing algorithms to localize and identify users interacting with a robot (Alonso-Martin et al, 2014;Alonso-Martín et al, 2012). For the identification process, a successful rate of about 70% is obtained, considering up to eight different users speaking at different times.…”
Section: Object and User Identification In Roboticsmentioning
confidence: 99%
“…Here, we briefly describe the RDS to facilitate understanding of the rest of this paper. More details can be found in [ 22 24 , 32 ]. The RDS is intended to manage the interaction between a robot and one or more users.…”
Section: The Robotics Dialog System: the Framework For The Augmented mentioning
confidence: 99%
“…Notice that both the information extraction and the information enrichment modules are in execution concurrently with the other perception skills. For instance, there is a skill that extracts some features of the user, analyzing the voice footprints, such as his/her name, gender [ 22 ], or the main emotion, using multimodal information [ 21 ]; also, there is another skill that localizes the external sound source in space [ 32 ], so it gives 2D information about where the user is. The multimodal fusion module groups together the enriched information with the data provided by the other perception skills [ 24 ].…”
Section: Proof Of Concept: Hri and Ardsmentioning
confidence: 99%