New methods of securing the distribution of audio content have been widely deployed in the last twenty years. Their impact on perceptive quality has, however, only been seldomly the subject of recent extensive research. We review digital speech watermarking state of the art and provide subjective testing of watermarked speech samples. Latest speech watermarking techniques are listed, with their specifics and potential for further development. Their current and possible applications are evaluated. Open-source software designed to embed watermarking patterns in audio files is used to produce a set of samples that satisfies the requirements of modern speech-quality subjective assessments. The patchwork algorithm that is coded in the application is mainly considered in this analysis. Different watermark robustness levels are used, which allow determining the threshold of detection to human listeners. The subjective listening tests are conducted following ITU-T P.800 Recommendation, which precisely defines the conditions and requirements for subjective testing. Further analysis tries to determine the effects of noise and various disturbances on watermarked speech’s perceived quality. A threshold of intelligibility is estimated to allow further openings on speech compression techniques with watermarking. The impact of language or social background is evaluated through an additional experiment involving two groups of listeners. Results show significant robustness of the watermarking implementation, retaining both a reasonable net subjective audio quality and security attributes, despite mild levels of distortion and noise. Extended experiments with Chinese listeners open the door to formulate a hypothesis on perception variations with geographical and social backgrounds.
Perceiving the transmitted speech is a task that puts certain amount of cognitive load on the human brain. The degree of this load depends on several factors, e.g., the loudness of the perceived speech, the type and intensity of background noise, the quality and accent of the speech, familiarity with the topic of the message, etc. This load also varies between the native and non-native language (of the listener). Different levels of such load are manifested in longer duration workloads (e.g., during a work shift) by different levels of overall fatigue, which affects the decrease in the worker's action or decision error rate when performing other concurrent tasks (the so-called parallel-task paradigm). For technologies used in speech transmission or synthesis, e.g., in telecommunications, radio communications, and machine to human communications, the above implies a strong need to optimize the coding of human (or synthetic) voice to minimize listening effort during communication. Listening effort (LE) can be assessed by subjective tests following, e.g., ITU-T P.800 Recommendation, along with listening quality (LQ) as specified in P.800. A natural (but nowhere explicitely mentioned) requirement is that male and female voices are transferred with similar LQ and LE parameters; in other words, the transmission technology, including coding algorithms, frequency filters, or sampling rates, should not privilege one gender over the other to maintain similar working conditions and opportunities for all.The subjective test laboratory has performed gender analysis for all subjective test projects since 2018 to see how (mis)balanced the transmission quality between male and female speakers is. The identified misbalance can affect many professionals that deploy distant voice communication in their daily duties – think of female airport approach control dispatchers or other professionals (policewomen) who are principally handicapped by technological aspects of their job - worse voice transmission quality means higher listening effort is needed and may lead to consequent (subconscious) discomfort of their communication partners, or even intelligibility issues. Of course, this fact is not surprising for narrow-band or even old analog AM transmissions (as still used in AIRCOM). It can only be used as an argument to upgrade communication means to a suitable digital format. Unfortunately, some contemporary wide-band or even full-band digital communications also show statistically significant differences between quality of transferred male and female voices. The detailed results will be presented, including interesting systematic language dependencies (English, German, Mandarin).In the conclusions, suggestions for future codec designs considering the human-centric gender-balanced requirements are proposed. These include the minimum frequency response of the future coders, granularity of the perceptual frequency scaling, etc. Also, suggestions for gender neutrality of original (studio quality) recordings used to prepare the speech samples for the subjective tests are included.
Present-day telecommunication devices are rarely utilized by comfortably seated users who do not perform any other parallel task. The typical communication sce-nario includes walking, driving a car, watching TV, working on a PC, etc., during a conversation. However, transmission quality evaluation has traditionally taken place in ideal laboratory conditions. The authors of this paper have prepared a new standard for subjective transmission quality testing with a parallel task that has been approved as ETSI TR 103 503. The paper summarizes the most widely-used transmission quality testing methods, discusses their disadvantages, and in-troduces a new testing methodology with a parallel task. It also presents two ex-periments performed in parallel task scenarios and highlights some differences in human perception in these scenarios.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.