2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2019
DOI: 10.1109/waspaa.2019.8937169
|View full text |Cite
|
Sign up to set email alerts
|

Speech Bandwidth Extension with Wavenet

Abstract: Large-scale mobile communication systems tend to contain legacy transmission channels with narrowband bottlenecks, resulting in characteristic 'telephone-quality' audio. While higher quality codecs exist, due to the scale and heterogeneity of the networks, transmitting higher sample rate audio with modern high-quality audio codecs can be difficult in practice. This paper proposes an approach where a communication node can instead extend the bandwidth of a band-limited incoming speech signal that may have been … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
20
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 26 publications
(20 citation statements)
references
References 13 publications
0
20
0
Order By: Relevance
“…Kuleshov et al [22] used a convolutional encoder-decoder network inspired by image super resolution. WaveNet [23] and its variants for BWE [24,25] use dilated convolutions to enable large receptive field while preserving the original resolution. Feng et al [6] used FFTNet [26] which resembles the classical FFT process.…”
Section: Related Workmentioning
confidence: 99%
“…Kuleshov et al [22] used a convolutional encoder-decoder network inspired by image super resolution. WaveNet [23] and its variants for BWE [24,25] use dilated convolutions to enable large receptive field while preserving the original resolution. Feng et al [6] used FFTNet [26] which resembles the classical FFT process.…”
Section: Related Workmentioning
confidence: 99%
“…Learning such a long feature sequence poses a challenge to conventional sequential modeling networks, including recurrent neural network and 1-D convolutional networks. Dilated convolutional layers were proposed to alleviate such problem [13,14]. Despite the increased receptive field over the input signals, they have not captured the utterance-level temporal information.…”
Section: Rdpn Core Modulementioning
confidence: 99%
“…Prior studies [5][6][7][8][9][10] are focused on estimating high-frequency magnitude and phase spectra in frequency domain. To overcome the inherent difficulty of phase estimation, time-domain frameworks [11][12][13][14][15] are proposed, that offer competitive voice quality.…”
Section: Introductionmentioning
confidence: 99%
“…Early methods were based on the source-filter model of speech production and exploit DNNs to estimate the upper frequency envelope [3]. Inspired by the early success in image superresolution [4], end-to-end audio-based solutions were proposed, based on wave-to-wave UNet [5], WaveNet [6,7], hybrid time/frequency-domain models [8]. All these methods are trained by minimizing a reconstruction loss, typically the The design of SEANet is similar to [12], but we adopt the losses proposed in [13], in which the reconstruction loss is computed in the feature space of the discriminator, at different scales and at different layers.…”
Section: Introductionmentioning
confidence: 99%