Social robotics have become a trend in contemporary robotics research, since they can be successfully used in a wide range of applications. One of the most fundamental communication skills a consumer robot must have is the oral interaction with a human, in order to provide feedback or accept commands. There are quite a few well established Automatic Speech Recognition (ASR) tools, however without providing efficient results, especially in less popular languages, and more importantly under noisy conditions. The current paper investigates different voice activity detection and noise elimination methodologies to be used with ASRbased oral interaction with an affordable budget robot, NAO v4. Acoustically semi-stationary environments are assumed, which in conjunction to the high background noise of the NAO's microphones make the ASR quite difficult to succeed.
Full Title:Improving multilingual interaction for consumer robots through signal enhancement in multichannel speechAdditional Information:
Question ResponseHas this article previously been published at a conference or convention?No Please enter the text of an appropriate cover letter to accompany your submission, briefly describing the article you want to submit and saying why you believe it is suitable for publication in the AES Journal.Within the context of this paper several single and multi-channel VAD/noise estimation and noise elimination strategies were implemented and evaluated towards ASR in acoustically semi-stationary environments. This work aspires to be assistive in the consumer robotics domain, since the utilization of expensive hardware (microphones or robots in general) is usually out of the question, thus a robust VAD and noise reduction approach should exist. The main novelty of this paper does not lie in the implementations per se but in the adaptation approach followed, aiming to optimize the parameters of each method in order to improve ASR results. Concluding, we believe that the current work is eligible for submission in the AES Journal, since it researches a realistic problem current consumer robots have, from a signal processing point of view.Social robotics have become a trend in contemporary robotics research, since they can be successfully used in a wide range of applications. One of the most fundamental communication skills a consumer robot must have is the oral interaction with a human, in order to provide feedback or accept commands. There are quite a few well established Automatic Speech Recognition (ASR) tools, without however providing efficient results, especially in less popular languages, and more importantly under noisy conditions. The current paper investigates different voice activity detection and noise elimination methodologies to be used with ASRbased oral interaction with an affordable budget robot, NAO v4. Acoustically quasi-stationary environments are assumed, which in conjunction with the high background noise of the NAO's microphones make the ASR quite difficult to succeed. 8