Speech Enhancement and Recognition in Meetings With an Audio–Visual Sensor Array

Maganti, Hari Krishna; Gática-Pérez, Daniel; McCowan, Iain

doi:10.1109/tasl.2007.906197

Cited by 54 publications

(34 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…We next compare our approach with three other audio-visual algorithms, the beamforming based method in [16] which we refer to as Naqvi, the technique in [17], which we term as Maganti and the scheme in [38] using robust beamforming, which we refer to as Naqvi2. Similar to our work, these audiovisual methods employ the visual modality to estimate the speaker locations which are then utilized within the algorithms.…”

Section: Comparison With Other Audio-visual Methodsmentioning

confidence: 99%

“…In [17] an audio-video multi-speaker tracker is proposed to localize sources and then separate them using microphone array beamforming. A postfiltering stage is then applied after the beamforming to further enhance the separation.…”

Section: Comparison With Other Audio-visual Methodsmentioning

confidence: 99%

“…But in a scenario where multiple speakers are simultaneously active and the environment is highly reverberant the audio localization scheme can fail. Similarly, localization for a single active speaker based only on audio is also difficult because human speech is an intermittent signal and contains much of its energy in the low-frequency bins where spatial discrimination is imprecise, and locations estimated only by audio are also affected by noise and room reverberations [17]. Thus, the visual modality with multiple camera integration is chosen as the most suitable approach for speaker localization; combination of audio and video localization is outside the scope of this work.…”

Section: Video Processingmentioning

confidence: 99%

See 2 more Smart Citations

Video-Aided Model-Based Source Separation in Real Reverberant Rooms

Khan

Naqvi

ur-Rehman

et al. 2013

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Section: Comparison With Other Audio-visual Methodsmentioning

confidence: 99%

Section: Comparison With Other Audio-visual Methodsmentioning

confidence: 99%

Section: Video Processingmentioning

confidence: 99%

See 1 more Smart Citation

Video-Aided Model-Based Source Separation in Real Reverberant Rooms

Khan

Naqvi

ur-Rehman

et al. 2013

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

“…In application point of view, the study presented in [21] addresses the problem of distant speech acquisition in multiparty meeting s using multiple cameras and microphones. The camera, used as a multi-person tracker, was used to give the more precise location of each person to the microphone array beam-former.…”

Section: B Beam-forming Based Speech Enhancementmentioning

confidence: 99%

Speech Enhancement Techniques: Quality vs. Intelligibility

Nemade¹,

Shah²

2014

IJFCC

View full text Add to dashboard Cite

Abstract-With the rapid advancements in industrial and technology applications, the demand from consumer for transmission and manipulation of data, primarily -speech, audio and images, are at its peak. The speech, being a fundamental way of communication for the humans, has been embedded in various essential applications like speech recognition, voice-distance-talk and other forms of personal communications. There are so many applications of speech still to be far from reality just because of lack of efficient and reliable noise removal mechanism and preserving or improving the intelligibility for the speech signals. The broad categories of speech enhancement techniques can be listed as speech filtering techniques, beam forming techniques and active noise cancellation methods. In this paper, an attempt has been stepped towards surveying the methodologies for speech improvement. It is also interesting to investigate, how these techniques affect the performance of various application systems like speech recognition and speech communication. Essentially, we also discuss about the types and sources of noise that can be considered in speech enhancements. Index Terms-Enhancement, beamforming and active noise cancellation, single channel enhancement I. INTRODUCTIONSpeech is the fundamental and common medium, hence important for us, to communicate and most effective and reliable means for expressing oneself for personal communication. With advancement in hardware technologies, there are so many electronic and mobile personal communication based devices available, today in market and that too in cheaper cost and with easy availability. The applications like speech recognition, mobile and personal communication, public address system are few of the applications from long list of speech based systems. However, undesired noises in environment like sound from heavy machines, vehicles are also present in one or other form everywhere. These noises cause undesired effects in speech transmission and acquiring systems. Recently, restricted or usable vicinity of applications is moving from one place and close room to more open and multiple locations, leading to several of types undesired signals of mixing with desired speech signal making speech more corrupt with noise. Not only human communications but intelligent machines which Manuscript received September 14, 2013; revised November 12, 2013 trying to automate the things and sometimes also takes decision based on what it receives as a speech, also suffers from the degraded performance.Since last five decades, various approaches for noise reduction and speech enhancements have been investigated and developed. Among, very early and fundamental approach of noise reduction was introduced to use the theory of the optimum Wiener filter. Given a desired signal and an input signal, the Wiener filter produces an estimate of the desired signal that is optimal, i.e. the squared mean error or difference between the signals is minimized. The Wiener filter can also be adaptively estimated us...

show abstract

“…Audio localization can also be affected by noise and room environment. Additionally, audio localization is not always effective due to the complexity in the case of multiple concurrent speakers [21]. Therefore, the accuracy of the audio localization would be degraded in a multisource real room environment with noise and reverberations, but video localization is robust in such an environment.…”

Section: Introductionmentioning

confidence: 99%

Audio video based fast fixed-point independent vector analysis for multisource separation in a room environment

Liang

Naqvi

Chambers

2012

EURASIP J. Adv. Signal Process.

View full text Add to dashboard Cite

Fast fixed-point independent vector analysis (FastIVA) is an improved independent vector analysis (IVA) method, which can achieve faster and better separation performance than original IVA. As an example IVA method, it is designed to solve the permutation problem in frequency domain independent component analysis by retaining the higher order statistical dependency between frequencies during learning. However, the performance of all IVA methods is limited due to the dimensionality of the parameter space commonly encountered in practical frequency-domain source separation problems and the spherical symmetry assumed with the source model. In this article, a particular permutation problem encountered in using the FastIVA algorithm is highlighted, namely the block permutation problem. Therefore a new audio video based fast fixed-point independent vector analysis algorithm is proposed, which uses video information to provide a smart initialization for the optimization problem. The method cannot only avoid the ill convergence resulting from the block permutation problem but also improve the separation performance even in noisy and high reverberant environments. Different multisource datasets including the real audio video corpus AV16.3 are used to verify the proposed method. For the evaluation of the separation performance on real room recordings, a new pitch based evaluation criterion is also proposed.

show abstract

Speech Enhancement and Recognition in Meetings With an Audio–Visual Sensor Array

Cited by 54 publications

References 42 publications

Video-Aided Model-Based Source Separation in Real Reverberant Rooms

Video-Aided Model-Based Source Separation in Real Reverberant Rooms

Speech Enhancement Techniques: Quality vs. Intelligibility

Audio video based fast fixed-point independent vector analysis for multisource separation in a room environment

Contact Info

Product

Resources

About