High-Quality Many-to-Many Voice Conversion Using Transitive Star Generative Adversarial Networks with Adaptive Instance Normalization

Li, Yanping; He, Zhengtao; Yan, Zhang; Yang, Zhen

doi:10.1142/s0218126621501887

Cited by 3 publications

(2 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Because GANs have good learning ability and the ability to simulate data distribution, they have been widely concerned in the field of machine learning. They show excellent performance in image generation [12], image translation [13], image enhancement [14], speech generation [15] and speech conversion [16]. GANs are composed of two neural networks: generator and discriminator.…”

Section: Introductionmentioning

confidence: 99%

Emotion Speech Synthesis Method Based on Multi-Channel Time–Frequency Domain Generative Adversarial Networks (MC-TFD GANs) and Mixup

Jia

Chen

2021

Arab J Sci Eng

View full text Add to dashboard Cite

As one of the most challenging and promising topics in speech field, emotion speech synthesis is a hot topic in current research. At present, the emotion expression ability, synthesis speed and robustness of synthetic speech need to be improved. Cycle-consistent Adversarial Networks (CycleGAN) provides a two-way breakthrough in the transformation of emotional corpus information. But there is still a gap between the real target and the synthesis speech. In order to narrow this gap, we propose an emotion speech synthesis method combining multi-channel Time–frequency Domain Generative Adversarial Networks (MC-TFD GANs) and Mixup. It includes three stages: multichannel Time–frequency Domain GANs (MC-TFD GANs), loss estimation based on Mixup and effective emotion region stacking based on Mixup. Among them, the gating unit GTLU (gated tanh linear units) and the image expression method of speech saliency region are designed. It combines the Time–frequency Domain MaskCycleGAN based on improved GTLU and the time-domain CycleGAN based on saliency region to form the multi-channel GAN in the first stage. Based on Mixup method, the calculation method of loss and the aggravation degree of emotion region are designed. Compared with several popular speech synthesis methods, the comparative experiments were carried out on the interactive emotional dynamic motion capture (IEMOCAP) corpus. The bi-directional three-layer long short-term memory (LSTM) model was used as the verification model. The experimental results showed that the mean opinion score (MOS) and the unweighted accuracy (UA) of the speech generated by the synthesis method were improved, and the improvements were 4% and 2.7%, respectively. The current model was superior to the existing GANs model in subjective evaluation and objective experiments, ensure that the speech generated by this model had higher reliability, better fluency and emotional expression ability.

show abstract

Section: Introductionmentioning

confidence: 99%

Emotion Speech Synthesis Method Based on Multi-Channel Time–Frequency Domain Generative Adversarial Networks (MC-TFD GANs) and Mixup

Jia

Chen

2021

Arab J Sci Eng

View full text Add to dashboard Cite

show abstract

“…2. Strengthen the use of pixel information around the image[31]. The advantages of IN are as follows: all elements of a single sample and a single channel are considered when calculating the normalized statistics.…”

mentioning

confidence: 99%

Shadow removal method of soil surface image based on GAN used for estimation of farmland soil moisture content

Meng

Yang

Wang

et al. 2023

Meas. Sci. Technol.

View full text Add to dashboard Cite

It is important to obtain soil moisture content (SMC) in farmland, and soil surface images can be used to rapidly estimate SMC. The objective of this study was to propose a shadow removal algorithm to eliminate the effect of shadows in soil surface images，so as to improve the accuracy of SMC estimation. The structure of the proposed Soil Shadow Generative Adversarial Networks (SS GAN) was a circulating network, which is an unsupervised method and does not require paired shadow image sets for network training. Four loss functions were defined for the network to effectively remove shadows and ensure texture detail and color consistency. This method is compared with traditional methods, supervised and unsupervised deep learning techniques by comparative experiments. Evaluations were made from visual and quantitative comparisons. Visually, the best shadow removal method was proved, it almost has no shadow boundaries or shadow areas visible for samples. The Peak Signal to Noise Ratio (PSNR) and Structural Similarity (SSIM) were used to quantitatively compare shadow removal images with real non-shadow images. The PSNR and SSIM of SS GAN were 28.46 and 0.95 respectively, which are superior to other methods, indicating that the images processed by SS GAN were closer to the real non-shadow images. Field experiments results shown that SS GAN has excellent shadow removal performance in the self-developed vehicle-mounted detection system. In order to verify the improvement effect of shadow removal image on SMC estimation accuracy, further field test was conducted to estimate SMC. Compared with SMC estimation results before and after shadow removal, R2 increased from 0.69 to 0.76, and Root Mean Square Error (RMSE)decreased from 1.39 to 0.94%. The results show that the proposed method can effectively remove the shadow of soil image and improve the accuracy of SMC estimation in farmland.

show abstract

One-Shot Voice Conversion Based on Style Generative Adversarial Networks with ESR and DSNet

Li,

Pan,

Qiu

et al. 2024

Circuits Syst Signal Process

View full text Add to dashboard Cite

High-Quality Many-to-Many Voice Conversion Using Transitive Star Generative Adversarial Networks with Adaptive Instance Normalization

Cited by 3 publications

References 30 publications

Emotion Speech Synthesis Method Based on Multi-Channel Time–Frequency Domain Generative Adversarial Networks (MC-TFD GANs) and Mixup

Emotion Speech Synthesis Method Based on Multi-Channel Time–Frequency Domain Generative Adversarial Networks (MC-TFD GANs) and Mixup

Shadow removal method of soil surface image based on GAN used for estimation of farmland soil moisture content

One-Shot Voice Conversion Based on Style Generative Adversarial Networks with ESR and DSNet

Contact Info

Product

Resources

About