Identifying the vocal cues of likeability, friendliness and skilfulness in synthetic speech

Rallabandi, Sai Sirisha; Naderi, Babak; Möller, Sebastian

doi:10.21437/ssw.2021-1

Cited by 3 publications

(8 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The evaluation of these systems have consistently suggested improvements in the existing synthesis procedure [14,15,16,17]. In our previous work, we have studied the commercial TTS systems, Google 1 and Amazon voices 2 [18,19]. Our study shows various speaker attributes contributing to the perception of the universal dimensions (warmth and competence) in synthetic speech [18].…”

Section: Introductionmentioning

confidence: 86%

“…DSSC represents the desired social speaker characteristics from synthetic speech [18]. DAV represents derived acoustic features [19]. To generate highly warm female speech, we studied the acoustic features that are commonly found in the speaker attributes, likeability and friendliness: F1 mean, F2 mean, spectral flux.…”

Section: Overviewmentioning

confidence: 99%

“…The openSMILE feature, F1 mean which is termed as, 'F1frequencysma3nzamean' was considered as one of the acoustic features responsible for both friendliness and likeability in female speech [19]. The maximum and minimum values were: 720.…”

Section: Experiments 1: F1 Meanmentioning

confidence: 99%

“…The openSMILE feature, F2 mean which is termed as, 'F2frequencysma3nzamean' was considered as one of the acoustic features responsible for the speaker attributes friendliness and likeability in female speech [19]. The maximum and minimum values were: 1957.…”

Section: Experiments 2: F2 Meanmentioning

confidence: 99%

“…Our study shows various speaker attributes contributing to the perception of the universal dimensions (warmth and competence) in synthetic speech [18]. In [19], we have derived the acoustic features that could affect warmth and competence in commercial TTS voices using linear regression. The speaker attributes we have considered in the study were as follows: friendliness and likability for warmth; skilfulness for competence.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

On incorporating social speaker characteristics in synthetic speech

Rallabandi¹,

Möller²

2022

Preprint

Self Cite

View full text Add to dashboard Cite

In our previous work, we derived the acoustic features, that contribute to the perception of warmth and competence in synthetic speech. As an extension, in our current work, we investigate the impact of the derived vocal features in the generation of the desired characteristics. The acoustic features, spectral flux, F1 mean and F2 mean and their convex combinations were explored for the generation of higher warmth in female speech. The voiced slope, spectral flux, and their convex combinations were investigated for the generation of higher competence in female speech. We have employed a feature quantization approach in the traditional end-to-end tacotron based speech synthesis model. The listening tests have shown that the convex combination of acoustic features displays higher Mean Opinion Scores of warmth and competence when compared to that of individual features.

show abstract

Section: Introductionmentioning

confidence: 86%

Section: Overviewmentioning

confidence: 99%

Section: Experiments 1: F1 Meanmentioning

confidence: 99%

Section: Experiments 2: F2 Meanmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

On incorporating social speaker characteristics in synthetic speech

Rallabandi¹,

Möller²

2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

Mixed-Cultural Speech for Intelligent Virtual Agents - the Impact of Different Non-Native Accents Using Natural or Synthetic Speech in the English Language

Obremski

Hering

Friedrich

et al. 2022

Proceedings of the 10th International Conference on Human-Agent Interaction

View full text Add to dashboard Cite

An acoustic study on character voices of dominators and subordinates: A case study on male characters in Empresses in the Palace

Liu

Zhang²,

Liang³

2023

Front. Commun.

View full text Add to dashboard Cite

IntroductionVoice has been used to project identity in dubbing, in order to auditory portray appropriate role images in TV dramas. This study investigates the character voices of leading male characters in Empresses in the Palace.MethodsDifferent acoustic characteristics of character voices and matching relation between acoustics and role images are explored by comparing F0, CPP, harmonic amplitude differences of speech spectrum.ResultsThe voice quality of characters is related to their relative social status. The subordinates usually adopt a higher pitch or breathy voice, while the dominators use a lower pitch or modal/creaky voice. In addition, CPP, F0, and H1-A3 are the key acoustic indicators to distinguish character voices.DiscussionThese results reveal the acoustic characteristics of character voices of certain types, as well as provide guidance for dubbing vividly.

show abstract

Identifying the vocal cues of likeability, friendliness and skilfulness in synthetic speech

Cited by 3 publications

References 0 publications

On incorporating social speaker characteristics in synthetic speech

On incorporating social speaker characteristics in synthetic speech

Mixed-Cultural Speech for Intelligent Virtual Agents - the Impact of Different Non-Native Accents Using Natural or Synthetic Speech in the English Language

An acoustic study on character voices of dominators and subordinates: A case study on male characters in Empresses in the Palace

Contact Info

Product

Resources

About