Human subjective evaluation is the "gold standard" to evaluate speech quality optimized for human perception. Perceptual objective metrics serve as a proxy for subjective scores. The conventional and widely used metrics require a reference clean speech signal, which is unavailable in real recordings. Previous no-reference approaches correlate poorly with human ratings and are not widely adopted in the research community. One of the biggest use cases of these perceptual objective metrics is to evaluate noise suppression algorithms. This paper introduces a multi-stage self-teaching based perceptual objective metric that is designed to evaluate noise suppressors. The proposed method generalizes well in challenging test conditions with a high correlation to human ratings.
The Deep Noise Suppression (DNS) challenge is designed to foster innovation in the area of noise suppression to achieve superior perceptual speech quality. We recently organized a DNS challenge special session at INTERSPEECH and ICASSP 2020. We opensourced training and test datasets for the wideband scenario. We also open-sourced a subjective evaluation framework based on ITU-T standard P.808, which was also used to evaluate participants of the challenge. Many researchers from academia and industry made significant contributions to push the field forward, yet even the best noise suppressor was far from achieving superior speech quality in challenging scenarios. In this version of the challenge organized at INTERSPEECH 2021, we are expanding both our training and test datasets to accommodate full band scenarios. The two tracks in this challenge will focus on real-time denoising for (i) wide band, and (ii) full band scenarios. We are also making available a reliable nonintrusive objective speech quality metric for wide band called DNS-MOS for the participants to use during their development phase.
The Deep Noise Suppression (DNS) challenge is designed to foster innovation in the area of noise suppression to achieve superior perceptual speech quality. We recently organized a DNS challenge special session at INTERSPEECH 2020 where we open-sourced training and test datasets for researchers to train their noise suppression models. We also open-sourced a subjective evaluation framework and used the tool to evaluate and select the final winners. Many researchers from academia and industry made significant contributions to push the field forward. We also learned that as a research community, we still have a long way to go in achieving excellent speech quality in challenging noisy real-time conditions. In this challenge, we expanded both our training and test datasets. Clean speech in the training set has increased by 200% with the addition of singing voice, emotion data, and non-English languages. The test set has increased by 100% with the addition of singing, emotional, non-English (tonal and non-tonal) languages, and, personalized DNS test clips. There are two tracks with focus on (i) real-time denoising, and (ii) real-time personalized DNS. We present the challenge results at the end.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.