A proof of concept system is developed to provide a broad assessment of speech development issues in children. It has been designed to enable non-experts to complete an initial screening of children's speech with the aim of reducing the workload on Speech Language Pathology services. The system was composed of an acoustic model trained by neural networks with split temporal context features and a constrained HMM encoded with the knowledge of Speech Language Pathologists. Results demonstrated the system was able to improve PER by 33% compared with standard HMM decoders, with a minimum PER of 19.03% achieved. Identification of Phonological Error Patterns with up to 94% accuracy was achieved despite utilizing only a small corpus of disordered speech from Australian children. These results indicate the proposed system is viable and the direction of further development are outlined in the paper.
Hearing loss is widespread and significantly impacts an individual's ability to engage with broadcast media. Access can be improved through new object-based audio personalization methods. Utilizing the literature on hearing loss and intelligibility this paper develops three dimensions that are evidenced to improve intelligibility: spatial separation, speech to noise ratio, and redundancy. These can be personalized, individually or concurrently, using object-based audio. A systematic review of all work in object-based audio personalization is then undertaken. These dimensions are utilized to evaluate each project's approach to personalization, identifying successful approaches, commercial challenges, and the next steps required to ensure continuing improvements to broadcast audio for hard of hearing individuals.
In everyday life, speech is often accompanied by a situationspecific acoustic cue; a hungry bark as you ask 'Has anyone fed the dog?'. This paper investigates the effect such cues have on speech intelligibility in noise and evaluates their interaction with the established effect of situation-specific semantic cues. This work is motivated by the introduction of new object-based broadcast formats, which have the potential to optimise intelligibility by controlling the level of individual broadcast audio elements, at point of service. Results of this study show that situation-specific acoustic cues alone can improve word recognition in multi-talker babble by 69.5%, a similar amount to semantic cues. The combination of both semantic and acoustic cues provide further improvement of 106.0% compared with no cues, and 18.7% compared with semantic cues only. Interestingly, whilst increasing subjective intelligibility of the target word, the presence of acoustic cues degraded the objective intelligibility of the speech-based semantic cues by 47.0% (equivalent to reducing the speech level by 4.5 dB). This paper discusses the interactions between the two types of cues and the implications that these results have for assessing and improving the intelligibility of broadcast speech.
This paper describes the continued development of a system to provide early assessment of speech development issues in children and better triaging to professional services. Whilst corpora of children's speech are increasingly available, recognition of disordered children's speech is still a data-scarce task. Transfer learning methods have been shown to be effective at leveraging out-of-domain data to improve ASR performance in similar data-scarce applications. This paper combines transfer learning, with previously developed methods for constrained decoding based on expert speech pathology knowledge and knowledge of the target text. Results of this study show that transfer learning with out-of-domain adult speech can improve phoneme recognition for disordered children's speech. Specifically, a Deep Neural Network (DNN) trained on adult speech and finetuned on a corpus of disordered children's speech reduced the phoneme error rate (PER) of a DNN trained on a children's corpus from 16.3% to 14.2%. Furthermore, this fine-tuned DNN also improved the performance of a Hierarchal Neural Network based acoustic model previously used by the system with a PER of 19.3%. We close with a discussion of our planned future developments of the system.
For traditional broadcasting formats, implementation of accessible audio strategies for hard of hearing people have used a binary, intelligibility-based approach. In this approach, sounds are categorized either as speech, contributing to comprehension of content, or non-speech, which can mask the speech and reduce intelligibility. Audio accessibility solutions have therefore focused on speech enhancement type methods, for which several useful standard objective measures of quality exist. Recent developments in next-generation broadcast audio formats, in particular the roll out of object-based audio, facilitate more in-depth personalisation of the audio experience based on user preferences and needs. Recent research has demonstrated that many non-speech sounds do not strictly behave as maskers but can be critical for comprehension of the narrative for some viewers. This complex relationship between speech, non-speech audio and the viewer necessitate a more holistic approach to understanding quality of experience of accessible media. This paper reviews previous work and outlines such an approach, discussing accessibility strategies using next-generation audio formats and their implications for developing effective assessments of quality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.