Despite their growing presence in home computer applications and various telephony services, commercial automatic speech recognition technologies are still not easily employed by everyone; especially individuals with speech disorders. In addition, relatively little research has been conducted on automatic speech recognition performance with older adults, in whom speech disorders are commonly present. As one ages, the older adult voice naturally begins to resemble some aspects of mildly dysarthric speech. Dysarthria, a common neuromotor speech disorder, is particularly useful for exploring performance limitations of automatic speech recognizers owing to its wide range of speech expression. This article reviews clinical research literature examining the use of commercial speech-to-text automatic speech recognition technology by individuals with dysarthria. The main factors limiting automatic speech recognition performance with dysarthric speakers are highlighted and extended to the elderly using a specific example of a novel, automated, speech-based personal emergency response system for older adults.
BackgroundThe purpose of this study was to derive data from real, recorded, personal emergency response call conversations to help improve the artificial intelligence and decision making capability of a spoken dialogue system in a smart personal emergency response system. The main study objectives were to: develop a model of personal emergency response; determine categories for the model’s features; identify and calculate measures from call conversations (verbal ability, conversational structure, timing); and examine conversational patterns and relationships between measures and model features applicable for improving the system’s ability to automatically identify call model categories and predict a target response.MethodsThis study was exploratory and used mixed methods. Personal emergency response calls were pre-classified according to call model categories identified qualitatively from response call transcripts. The relationships between six verbal ability measures, three conversational structure measures, two timing measures and three independent factors: caller type, risk level, and speaker type, were examined statistically.ResultsEmergency medical response services were the preferred response for the majority of medium and high risk calls for both caller types. Older adult callers mainly requested non-emergency medical service responders during medium risk situations. By measuring the number of spoken words-per-minute and turn-length-in-words for the first spoken utterance of a call, older adult and care provider callers could be identified with moderate accuracy. Average call taker response time was calculated using the number-of-speaker-turns and time-in-seconds measures. Care providers and older adults used different conversational strategies when responding to call takers. The words ‘ambulance’ and ‘paramedic’ may hold different latent connotations for different callers.ConclusionsThe data derived from the real personal emergency response recordings may help a spoken dialogue system classify incoming calls by caller type with moderate probability shortly after the initial caller utterance. Knowing the caller type, the target response for the call may be predicted with some degree of probability and the output dialogue could be tailored to this caller type. The average call taker response time measured from real calls may be used to limit the conversation length in a spoken dialogue system before defaulting to a live call taker.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.