2018
DOI: 10.1121/1.5053115
|View full text |Cite
|
Sign up to set email alerts
|

An ideal quantized mask to increase intelligibility and quality of speech in noise

Abstract: Time-frequency (T-F) masks represent powerful tools to increase the intelligibility of speech in background noise. Translational relevance is provided by their accurate estimation based only on the signal-plus-noise mixture, using deep learning or other machine-learning techniques. In the current study, a technique is designed to capture the benefits of existing techniques. In the ideal quantized mask (IQM), speech and noise are partitioned into T-F units, and each unit receives one of N attenuations according… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
3
0
2

Year Published

2019
2019
2025
2025

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 40 publications
0
3
0
2
Order By: Relevance
“…ITFS processing usually assumes a priori knowledge of the target and masker signals so that the energy relations in each time-frequency (T-F) unit may be specified exactly and, in that sense, is fundamentally different than normal human perception. The use of ITFS to separate EM from IM is based on the premise that listeners extracting speech in a mixture of sounds likely rely only (or predominantly) on the subset of T-F units in which the energy of the target source relative to the energy of the masking source(s) (e.g., Cooke, 2006;Li and Loizou, 2007;Healey and Vasko, 2018) exceeds a specified value (termed the level criterion, or LC). The T-F units in which masker energy dominates target energy logically fall under the definition of EM because of the assumption that the neural response in a small T-F unit would be driven by the properties (i.e., amplitude and timing) of the higher-energy source.…”
Section: Introductionmentioning
confidence: 99%
“…ITFS processing usually assumes a priori knowledge of the target and masker signals so that the energy relations in each time-frequency (T-F) unit may be specified exactly and, in that sense, is fundamentally different than normal human perception. The use of ITFS to separate EM from IM is based on the premise that listeners extracting speech in a mixture of sounds likely rely only (or predominantly) on the subset of T-F units in which the energy of the target source relative to the energy of the masking source(s) (e.g., Cooke, 2006;Li and Loizou, 2007;Healey and Vasko, 2018) exceeds a specified value (termed the level criterion, or LC). The T-F units in which masker energy dominates target energy logically fall under the definition of EM because of the assumption that the neural response in a small T-F unit would be driven by the properties (i.e., amplitude and timing) of the higher-energy source.…”
Section: Introductionmentioning
confidence: 99%
“…To further investigate the effectiveness of quantized masking approaches, we use the Chimera++ model to estimate the IQM [13], where the models are Chi++ IQM2 , Chi++ IQM3 , Chi++ IQM4 , and Chi++ IQM8 which predicts enhanced speech using IQM2, IQM3, IQM4, and IQM8 respectively. Here X in IQMX refers to the number of attenuation levels in the mask (see [13]). Additionally, we compare with a Chimera++ network that predicts quantized speech Chi++ quant (proposed without the QSM).…”
Section: Experimental Setup and Resultsmentioning
confidence: 99%
“…In [12], separate discrete T-F masks for magnitude and phase responses are estimated using softmax activations, where recurrent networks are used to capture temporal correlations. The ideal quantized mask (IQM) has also recently been proposed [13]. It shows that quantization of the IRM, by coding each T-F IRM value into one of a number of quantized bins, is a reasonable representation of the IRM as assessed by human listeners.…”
Section: Introductionmentioning
confidence: 99%
“…3) IQM: A máscara IQM [8] utiliza IRM(k, q) como base para a obtenção de N possíveis valores de IQM(k, q). Isto é feito a partir de uma seleção de pontos da máscara IRM de acordo com:…”
Section: B Máscaras Acústicas Ideaisunclassified
“…A máscara IRM (Ideal Ratio Mask) [7] define valores fracionários de ganho em cada quadro TF de acordo com a razão entre a energia do sinal limpo e a do sinal corrompido. Baseada na IRM, a máscara ideal IQM (Ideal Quantized Mask) utiliza valores discretos de níveis de atenuação [8]. Entre as máscaras cegas, a BRM (Binary Reverberant Mask) apresentou melhora de inteligibilidade para ambientes com reverberação em testes subjetivos [9].…”
Section: Introductionunclassified