Accessibility in the workplace and in academic settings has increased dramatically for users with disabilities, driven by greater awareness, legislative mandate, and technological improvements. Gaps, however, remain. For persons who are deaf and hard of hearing in particular, full participation requires complete access to audio materials, both for live settings and for prerecorded audio and visual information. Even for users with adequate hearing, captioned or transcribed materials offer another modality for information access, one that can be particularly useful in certain situations, such as listening in noisy environments, interpreting speakers with strong accents, or searching audio media for specific information. Providing this level of access through fully automated means is currently beyond the state of the art. This paper details a number of key advances in audio access that have occurred over the last five years. We describe the Liberated Learning Project, a consortium of universities worldwide, which is piloting technologies to create real-time access for students who are deaf and hard of hearing, without intermediary assistance. In support of this project, IBM Research has created the ViaScribee tool that converts speech recognition output to a viable captioning interface. Additional inventions and incremental improvements to speech recognition for captioning are described, as well as future directions.
This paper outlines the background development of “intelligent” technologies such as speech recognition. Despite significant progress in the development of these technologies, they still fall short in many areas, and rapid advances in areas such as dictation are actually stalled. In this paper we have proposed semi-automatic solutions — smart integration of human and intelligent efforts. One such technique involves improvement to the speech recognition editing interface, thereby reducing the perception of errors to the viewer. Other techniques that are described in the paper are batch enrollment, which allows the user to reduce the amount of time required for enrollment, and content spotting, which can be used for applications that have repeated content flow, such as movies or museum tours. The paper also suggests a general concept of distributive training of speech recognition systems that is based on data collection across a network.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.