Intelligibility of time-compressed synthetic speech: Compression method and speaking style

Valentini-Botinhao, Cassia; Toman, Markus; Pucher, Michael; Schabus, Dietmar; Yamagishi, Junichi

doi:10.1016/j.specom.2015.09.002

Cited by 4 publications

(4 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Ninth, it is unclear how effective the spearcons would be if they were produced from text-to-speech, as many spearcons are (Walker et al, 2006). As noted in the Introduction to Experiment 1, Valentini-Botinhao et al (2015) found that compressed speech was more understandable when based on natural speech than when based on text-to-speech, especially at high compression rates. Whether that would be the same for compressed speech as short as spearcons remains to be seen.…”

Section: Relationship With Conventional Alarmsmentioning

confidence: 99%

“…Specifically, we used naturally spoken English for the uncompressed speech. A recent study indicated that time-compressed speech based on naturally spoken English leads to fewer comprehension errors than when based on TTS, especially at high compression rates (Valentini-Botinhao, Toman, Pucher, Schabus, & Yamagishi, 2015). We used the utterance position speed-up (UPSU) method to compress the speech, which compresses the start of the utterance slightly less than the end of the utterance to help comprehension as the words are presented (Dupoux & Green, 1997; Tucker & Whittaker, 2008).…”

Section: Experiments 1: Spearcons Versus Earcons For Single-patient M...mentioning

confidence: 99%

See 1 more Smart Citation

Monitoring vital signs with time-compressed speech.

Sanderson¹,

Brecknell²,

Leong³

et al. 2019

Journal of Experimental Psychology: Applied

View full text Add to dashboard Cite

Spearcons—time-compressed speech phrases—may be an effective way of communicating vital signs to clinicians without disturbing patients and their families. Four experiments tested the effectiveness of spearcons for conveying oxygen saturation (SpO2) and heart rate (HR) of one or more patients. Experiment 1 demonstrated that spearcons were more effective than earcons (abstract auditory motifs) at conveying clinical ranges. Experiment 2 demonstrated that casual listeners could not learn to decipher the spearcons whereas listeners told the exact vocabulary could. Experiment 3 demonstrated that participants could interpret sequences of sounds representing multiple patients better with spearcons than with pitch-based earcons, especially when tones replaced the spearcons for normal patients. Experiment 4 compared multiple-patient monitoring of two vital signs with either spearcons, a visual display showing SpO2 and HR in the same temporal sequence as the spearcons, or a visual display showing multiple patient levels simultaneously. All displays conveyed which patients were abnormal with high accuracy. Visual displays better conveyed the vital sign levels for each patient, but cannot be used eyes-free. All displays showed accuracy decrements with working memory load. Spearcons may be viable for single and multiple patient monitoring. Further research should test spearcons with more vital signs, during multitasking, and longitudinally.

show abstract

Section: Relationship With Conventional Alarmsmentioning

confidence: 99%

Section: Experiments 1: Spearcons Versus Earcons For Single-patient M...mentioning

confidence: 99%

Monitoring vital signs with time-compressed speech.

Sanderson¹,

Brecknell²,

Leong³

et al. 2019

Journal of Experimental Psychology: Applied

View full text Add to dashboard Cite

show abstract

“…where k=1,.., Ng, Δ lu represents the quant width, whereas Ng is a number of representation levels [3], [11]. Log-uniform quantizer is designed for low and middle bit-rates (number of quantization levels (Ng) is 2, 4, 8 and 16).…”

Section: Transform Coding and Quantizers Designmentioning

confidence: 99%

“…This way, signal compression makes storing and transmission of the digital signal easier, since it requires less memory resources and narrower bandwidth for transmission while customers' experience is satisfactory [1], [2]. Although natural speech signal processing is the mostly researched, traditionally, with the growth of information technologies a lot of papers are dedicated to synthetic speech signal processing, due to its' importance in education (distance learning, foreign languages, blind individuals) and automatic recognition [3], [4].…”

Section: Introductionmentioning

confidence: 99%

Speech Signal Coding Using Forward Adaptive Quantization and Simple Transform Coding

Tančić¹,

Perić²,

Tomić³

et al. 2016

ElAEE

View full text Add to dashboard Cite

The paper proposes a novel speech signal coding scheme that implements a simple transform coding and forward adaptive quantization. The proposed scheme is adapted to the input signal variance, providing highly efficient bandwidth usage, whereas implemented transform coding provides sub-sequences with more predictable signal characteristics, so that more suitable signal processing can be performed. The aforementioned transform coding precedes adaptive quantization, providing additional compression. The objective quality measure used for system performance estimation is SQNR (signal-to-quantization-noise ratio), which represents a standard measure for lossy coding types. The influence of transform coding is discussed by comparing the obtained results with the corresponding one achieved by applying only the same adaptive quantization. Furthermore, the comparison with system performance of PCM (pulse-code modulation) coding system confirms that the proposed coding scheme has a lot of potential for further implementation, since that the proposed system ensures SQNR gain up to 4.0983 [dB] for various values of system parameters.

show abstract