This paper proposes a visuo-auditory substitution method to assist visually impaired people in scene understanding. Our approach focuses on person localisation in the user's vicinity in order to ease urban walking. Since a real-time and low-latency is required in this context for user's security, we propose an embedded system. The processing is based on a lightweight convolutional neural network to perform an efficient 2D person localisation. This measurement is enhanced with the corresponding person depth information, and is then transcribed into a stereophonic signal via a head-related transfer function. A GPU-based implementation is presented that enables a real-time processing to be reached at 23 frames/s on a 640x480 video stream. We show with an experiment that this method allows for a real-time accurate audio-based localization.
IntroductionVisual-to-auditory sensory substitution devices are assistive devices for the blind that convert visual images into auditory images (or soundscapes) by mapping visual features with acoustic cues. To convey spatial information with sounds, several sensory substitution devices use a Virtual Acoustic Space (VAS) using Head Related Transfer Functions (HRTFs) to synthesize natural acoustic cues used for sound localization. However, the perception of the elevation is known to be inaccurate with generic spatialization since it is based on notches in the audio spectrum that are specific to each individual. Another method used to convey elevation information is based on the audiovisual cross-modal correspondence between pitch and visual elevation. The main drawback of this second method is caused by the limitation of the ability to perceive elevation through HRTFs due to the spectral narrowband of the sounds.MethodIn this study we compared the early ability to localize objects with a visual-to-auditory sensory substitution device where elevation is either conveyed using a spatialization-based only method (Noise encoding) or using pitch-based methods with different spectral complexities (Monotonic and Harmonic encodings). Thirty eight blindfolded participants had to localize a virtual target using soundscapes before and after having been familiarized with the visual-to-auditory encodings.ResultsParticipants were more accurate to localize elevation with pitch-based encodings than with the spatialization-based only method. Only slight differences in azimuth localization performance were found between the encodings.DiscussionThis study suggests the intuitiveness of a pitch-based encoding with a facilitation effect of the cross-modal correspondence when a non-individualized sound spatialization is used.
This paper proposes a new navigation device to assist visually impaired people reach a defined destination safely in the indoor environment. This approach based on visual-auditory substitution provides the user a 2D spatial sound perception of the destination and of nearby and dangerous obstacles. Visual markers are placed at several relevant locations to create a mesh of the building where each marker is visually accessible from another marker. A graph representation of markers locations and their connection to each other defines by a way finding algorithm the shortest path reach to the wished position. The navigation task is achieved by moving from visual marker to visual marker until the desired destination is reached. These markers can be used independently of any other system or in addition to other solutions based on geolocalisation and/or a digital building model. Moreover, further information can be associated to the markers, and therefore verbalize to the user for instance a temporary hazards, a door presence or any other usual displayed information. The passive visual markers enables to deploy easily and quickly a scalable and low-cost solution to "signpost" the environment for users. Combined with our realtime implemented obstacle detection, their analysis enables the navigational abilities of visually impaired people to be improved.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.