We propose a novel training strategy for Tacotronbased text-to-speech (TTS) system to improve the expressiveness of speech. One of the key challenges in prosody modeling is the lack of reference that makes explicit modeling difficult. The proposed technique doesn't require prosody annotations from training data. It doesn't attempt to model prosody explicitly either, but rather encodes the association between input text and its prosody styles using a Tacotron-based TTS framework. Our proposed idea marks a departure from the style token paradigm where prosody is explicitly modeled by a bank of prosody embeddings. The proposed training strategy adopts a combination of two objective functions: 1) frame level reconstruction loss, that is calculated between the synthesized and target spectral features; 2) utterance level style reconstruction loss, that is calculated between the deep style features of synthesized and target speech. The proposed style reconstruction loss is formulated as a perceptual loss to ensure that utterance level speech style is taken into consideration during training. Experiments show that the proposed training strategy achieves remarkable performance and outperforms a state-of-the-art baseline in both naturalness and expressiveness. To our best knowledge, this is the first study to incorporate utterance level perceptual quality as a loss function into Tacotron training for improved expressiveness.
In this paper, the digital image correlation was innovatively applied
to study the deformation and damage process of raw coal and briquette
under a complex stress environment. The results show that under
symmetrical loading, briquette coal shows tensile failure and that the
strain field goes through three stages. The raw coal shows shear
failure; the stage characteristic of the strain field is not obvious.
Under asymmetric loading, the strain field evolution of raw coal and
briquette shows three characteristic stages, but the briquette is more
likely to form a localization phenomenon. The displacement value of
the crack in the shear direction is greater than that in the tension
direction, so the raw coal and briquette mainly undergo shear failure.
The localized starting stress is determined by the defined statistical
index function, and the localized starting stress of the raw coal and
the briquette coal has a quadratic function relationship with the
asymmetric coefficient.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.