This letter proposes a new time domain absorption approach designed to reduce masking components of speech signals under noisy-reverberant conditions. In this method, the nonstationarity of corrupted signal segments is used to detect masking distortions based on a defined threshold. The non-stationarity is objectively measured and is also adopted to determine the absorption procedure. Additionally, no prior knowledge of speech statistics or room information is required for this technique. Two intelligibility measures (ESII and ASII ST ) are used for objective evaluation. The results show that the proposed scheme leads to a higher intelligibility improvement when compared to competing methods. A perceptual listening test is further considered and corroborates these results. Furthermore, the updated version of the SRMR quality measure (SRMRnorm) demonstrates that the proposed technique also attains quality improvement.
In this work, a metric learning-based approach is proposed for non-stationary acoustic source classification. A classic time-frequency representation of acoustic signals is adopted as input of a convolutional neural network in order to generate embedded features of reduced size. The embedding generation is optimized on similarity constraints in order to maximize intra-class and minimize inter-class distances. Eight sources with different degrees of non-stationarity are selected for the acoustic source classification task. Experiments demonstrated that the proposed solution outperforms the baseline systems for all individual acoustic sources, leading to an increment in the average balanced accuracy of more than twenty percentage points.
In this paper, a two-stage time domain technique is proposed to improve intelligibility of speech signals under noisy-reverberant conditions. In this method, the NNESE and ARA NSD methods are jointly taken into account to mitigate the effects of noise and reverberation separately. Additionally, the resulting approach is adaptive in the sense that no prior knowledge of speech statistics or room information is required. Two intelligibility measures (ASII ST and ESII) are used for objective evaluation. The results show that the proposed twostage scheme leads to a higher intelligibility improvement when compared to competing methods, specially for low SNR values. Furthermore, the PESQ and the updated version of the SRMR quality measure (SRMRnorm) demonstrate that the proposed technique also attains quality improvement.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.