2007
DOI: 10.1109/tasl.2007.906197
|View full text |Cite
|
Sign up to set email alerts
|

Speech Enhancement and Recognition in Meetings With an Audio–Visual Sensor Array

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
33
0
1

Year Published

2012
2012
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 54 publications
(34 citation statements)
references
References 42 publications
0
33
0
1
Order By: Relevance
“…We next compare our approach with three other audio-visual algorithms, the beamforming based method in [16] which we refer to as Naqvi, the technique in [17], which we term as Maganti and the scheme in [38] using robust beamforming, which we refer to as Naqvi2. Similar to our work, these audiovisual methods employ the visual modality to estimate the speaker locations which are then utilized within the algorithms.…”
Section: Comparison With Other Audio-visual Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…We next compare our approach with three other audio-visual algorithms, the beamforming based method in [16] which we refer to as Naqvi, the technique in [17], which we term as Maganti and the scheme in [38] using robust beamforming, which we refer to as Naqvi2. Similar to our work, these audiovisual methods employ the visual modality to estimate the speaker locations which are then utilized within the algorithms.…”
Section: Comparison With Other Audio-visual Methodsmentioning
confidence: 99%
“…In [17] an audio-video multi-speaker tracker is proposed to localize sources and then separate them using microphone array beamforming. A postfiltering stage is then applied after the beamforming to further enhance the separation.…”
Section: Comparison With Other Audio-visual Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In application point of view, the study presented in [21] addresses the problem of distant speech acquisition in multiparty meeting s using multiple cameras and microphones. The camera, used as a multi-person tracker, was used to give the more precise location of each person to the microphone array beam-former.…”
Section: B Beam-forming Based Speech Enhancementmentioning
confidence: 99%
“…Audio localization can also be affected by noise and room environment. Additionally, audio localization is not always effective due to the complexity in the case of multiple concurrent speakers [21]. Therefore, the accuracy of the audio localization would be degraded in a multisource real room environment with noise and reverberations, but video localization is robust in such an environment.…”
Section: Introductionmentioning
confidence: 99%