2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2017
DOI: 10.1109/icassp.2017.7953212
|View full text |Cite
|
Sign up to set email alerts
|

Novel Amplitude Scaling method for bilinear frequency Warping-based Voice Conversion

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2017
2017
2018
2018

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 25 publications
0
3
0
Order By: Relevance
“…We quantize the parameter α to a set of values within a feasible range [α min , α max ], and consider three ways of forming a region of support: relaxed, intermediate, and strict. It is worth noting that previous efforts on using bilinear frequency warping for VC mostly operate on cepstral features [34]- [35], where a transform matrix is a function of α, but the warping constraint would not form a ROS to define a sparse matrix.…”
Section: B Frequency Warping Constrained Ros Embeddingmentioning
confidence: 99%
“…We quantize the parameter α to a set of values within a feasible range [α min , α max ], and consider three ways of forming a region of support: relaxed, intermediate, and strict. It is worth noting that previous efforts on using bilinear frequency warping for VC mostly operate on cepstral features [34]- [35], where a transform matrix is a function of α, but the warping constraint would not form a ROS to define a sparse matrix.…”
Section: B Frequency Warping Constrained Ros Embeddingmentioning
confidence: 99%
“…Here, we propose to use unsupervised Vocal Tract Length Normalization (VTLN)-based warped Gaussian Posteriorgram (GP) features as the speaker-independent representations. Earlier stand-alone VTLN and frequency warpingbased techniques were used in the VC [18,19]. We performed experiments on a small subset of publicly available first Voice Conversion Challenge (VCC) 2016 database [20].…”
Section: Introductionmentioning
confidence: 99%
“…VC broadly can be categorized into parallel (if both the speakers have spoken the same utterances) and non-parallel cases (if both the speakers have spoken different utterances from a same language or different language). Stand-alone VC techniques that are based on Gaussian Mixture Model (GMM) [2,3], frequency warping (FW) [4,5], exemplar [6] and Deep Neural Network (DNN) [7][8][9] requires the aligned spectral features before learning the mapping function. In the VC literature, it has been shown that the alignment accuracy clearly affects the quality of converted speech signal [10][11][12].…”
Section: Introductionmentioning
confidence: 99%