ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9747395
|View full text |Cite
|
Sign up to set email alerts
|

One Model to Enhance Them All: Array Geometry Agnostic Multi-Channel Personalized Speech Enhancement

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
17
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 18 publications
(17 citation statements)
references
References 21 publications
0
17
0
Order By: Relevance
“…Missing multi-stream data has been used to attain better performance on tasks such as speech enhancement (Taherian et al, 2022). However, speech enhancement is only partially dependent on spatial information, which is harder to recover.…”
Section: Related Workmentioning
confidence: 99%
“…Missing multi-stream data has been used to attain better performance on tasks such as speech enhancement (Taherian et al, 2022). However, speech enhancement is only partially dependent on spatial information, which is harder to recover.…”
Section: Related Workmentioning
confidence: 99%
“…TSOS measures the degree of removal of the target speaker's speech segments and is critical for PSE since removing the target speech hampers effective conversations and degrades the transcription quality, as reported in [8]. Furthermore, Taherian et al [5] extended [4] to multi-channel scenarios by proposing a model that works with any microphone numbers and array geometries. Although the models of [4] can run on PCs in realtime, the computational cost was still too high for real usage as the audio processing can use only a tiny fraction of the available resources on devices.…”
Section: Related Workmentioning
confidence: 99%
“…Personalized speech enhancement (PSE) provides an improvement to the general SE approach by using prior knowledge about a target speaker [2,3,4,5]. One exemplary approach to PSE is to extract a speaker embedding vector from a short enrollment audio sample of the target speaker and feed it to an SE model.…”
Section: Introductionmentioning
confidence: 99%
“…Several studies developed causal PSE models utilizing a speaker embedding vector to extract the target speaker's voice. [1,2,3,7,8]. Giri et al proposed a perceptually motivated PSE model with low complexity [2].…”
Section: Related Workmentioning
confidence: 99%
“…Meanwhile, personalized speech enhancement (PSE) is gaining increased attention from the research community. PSE utilizes additional cues such as a speaker embedding vector of a target speaker to enhance only the speaker's signal even when interfering speech and background noise are both present [1,2,3]. The PSE task may be regarded as a combination of speech separation, enhancement, and speaker verification tasks.…”
Section: Introductionmentioning
confidence: 99%