AVSE Challenge: Audio-Visual Speech Enhancement Challenge

Blanco, Andrea Lorena Aldana; Valentini-Botinhao, Cassia; Klejch, Ondřej; Gogate, Mandar; Dashtipour, Kia; Hussain, Amir; Bell, P. J.

doi:10.1109/slt54892.2023.10023284

Cited by 9 publications

(6 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To create the AV dataset used in the evaluation, we selected a set of TED and TEDx videos 1 of public lectures delivered by a single speaker. Details about the train, dev and eval AV datasets can be found in [14]. After selecting the videos, we extracted sentences based on the manual transcriptions of the talks.…”

Section: Audio-visual Evaluation Datasetmentioning

confidence: 99%

“…After validating our proposed method we conducted a largescale evaluation of speech enhancement systems submitted to [14]. We evaluated nine systems (including the baseline model), and the original (i.e., not enhanced) samples.…”

Section: Evaluation Of Avse Systemsmentioning

confidence: 99%

“…To design this closed set, choose target words and sentences we mine pre-existing stimuli using a phonetic dictionary and a language model. To validate the method we present details of the first wide scale evaluation of audio-visual speech enhancement systems, performed for the AVSE 2022 Challenge [14]. We show how the proposed paradigm resembles a transcription-based task in terms of ranking intelligibility of stimuli mixed with different maskers at different levels and that the new method is suitable for ranking the performance of AVSE systems,…”

Section: Introductionmentioning

confidence: 98%

See 2 more Smart Citations

Efficient Intelligibility Evaluation Using Keyword Spotting: A Study on Audio-Visual Speech Enhancement

Valentini-Botinhao

Blanco

Klejch

et al. 2023

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

We propose a new method for human speech intelligibility evaluation based on keyword spotting. In this method, participants play a stimulus and select the word they hear from a close set of alternatives. To find which sentence to use, the target word, and alternatives we mine a large set of stimuli using a phonetic dictionary and a language model. Unlike other tests, our method does not rely on specially designed sentences and can be used to evaluate in-the-wild material such as TED talks. We focus on audio-visual (AV) speech enhancement (SE) evaluation as a study case. We compared our method to a transcription task and observed that the two produce highly correlated results, albeit our task requiring substantially less participation time. We then adopted it on a large-scale evaluation of AVSE systems. Results show that keyword spotting is a suitable and efficient alternative to assess intelligibility from AV stimuli.

show abstract

Section: Audio-visual Evaluation Datasetmentioning

confidence: 99%

Section: Evaluation Of Avse Systemsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 98%

See 1 more Smart Citation

Efficient Intelligibility Evaluation Using Keyword Spotting: A Study on Audio-Visual Speech Enhancement

Valentini-Botinhao

Blanco

Klejch

et al. 2023

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…Integrating them seamlessly can be a significant challenge to achieve a comprehensive and effective AV HAT technology for individuals with hearing loss. The new Audio-Visual Speech Enhancement (AVSE) Challenge takes the first step toward accomplishing this by setting benchmarks in this research area [9].…”

Section: Complexity Of Integrating Multiple Technologiesmentioning

confidence: 99%

“…1 https://www.who.int/news-room/fact-sheets/deta il/deafness-and-hearing-loss processing [7]. The multi-modal aspect of AV hearing assistive technology (HAT) may provide a range of benefits for users, including the capability to selectively enhance speech based on the user's eye gaze [8,9] and lipreading-based technologies [10]. Given that speech enhancement in noisy environments is especially challenging, adding the visual aspect to hearing aid algorithms has been predicted to result in more reliable performance [11].…”

Section: Introductionmentioning

confidence: 99%

Socio-Technical Trust For Multi-Modal Hearing Assistive Technology

Williams

Azim

Piskopani

et al. 2023

2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)

View full text Add to dashboard Cite

The landscape of opportunity is rapidly changing for audio-visual (AV) hearing assistive technology. While hearing assistive devices, such as hearing aids, have traditionally been developed for populations of deaf and hard of hearing (DHH) communities, the ubiquitous use of in-ear technology and recent advances in edge computing are reformulating what drives research and development in this domain. With that comes new challenges to consider from the perspective of multiple different stakeholders. In this position paper, we elaborate on seven key socio-technical challenges that may impede the adoption of trustworthy multi-modal hearing assistive technologies. We also draw upon a recent survey being piloted in the UK to examine perceptions of trust for audio systems in the context of human rights. We strongly encourage the research community to consider trust as a factor in developing new AV assistive hearing technologies, as trust may ultimately drive adoption of this technology within broader society.

show abstract

A Study on Domain Adaptation for Audio-Visual Speech Enhancement

Wang,

Chen,

et al. 2024

Communications in Computer and Information Science

View full text Add to dashboard Cite

AVSE Challenge: Audio-Visual Speech Enhancement Challenge

Cited by 9 publications

References 23 publications

Efficient Intelligibility Evaluation Using Keyword Spotting: A Study on Audio-Visual Speech Enhancement

Efficient Intelligibility Evaluation Using Keyword Spotting: A Study on Audio-Visual Speech Enhancement

Socio-Technical Trust For Multi-Modal Hearing Assistive Technology

A Study on Domain Adaptation for Audio-Visual Speech Enhancement

Contact Info

Product

Resources

About