This paper describes recent work on the “Crosswatch” project [7], which is a computer vision-based smartphone system developed for providing guidance to blind and visually impaired travelers at traffic intersections. A key function of Crosswatch is self-localization – the estimation of the user’s location relative to the crosswalks in the current traffic intersection. Such information may be vital to users with low or no vision to ensure that they know which crosswalk they are about to enter, and are properly aligned and positioned relative to the crosswalk. However, while computer vision-based methods have been used [1,9,14] for finding crosswalks and helping blind travelers align themselves to them, these methods assume that the entire crosswalk pattern can be imaged in a single frame of video, which poses a significant challenge for a user who lacks enough vision to know where to point the camera so as to properly frame the crosswalk.
In this paper we describe work in progress that tackles the problem of crosswalk detection and self-localization, building on recent work [8] describing techniques enabling blind and visually impaired users to acquire 360° image panoramas while turning in place on a sidewalk. The image panorama is converted to an aerial (overhead) view of the nearby intersection, centered on the location that the user is standing at, so as to facilitate matching with a template of the intersection obtained from Google Maps satellite imagery. The matching process allows crosswalk features to be detected and permits the estimation of the user’s precise location relative to the crosswalk of interest. We demonstrate our approach on intersection imagery acquired by blind users, thereby establishing the feasibility of the approach.
We present a technique for mobile robot exploration in unknown indoor environments using only a single forward-facing camera. Rather than processing all the data, the method intermittently examines only small 32 × 24 downsampled grayscale images. We show that for the task of indoor exploration the visual information is highly redundant, allowing successful navigation even using only a small fraction of the available data. The method keeps the robot centered in the corridor by estimating two state parameters: the orientation within the corridor, and the distance to the end of the corridor. The orientation is determined by combining the results of five complementary measures, while the estimated distance to the end combines the results of three complementary measures. These measures, which are predominantly information-theoretic, are analyzed independently, and the combined system is tested in several unknown corridor buildings exhibiting a wide variety of appearances, showing the sufficiency of low-resolution visual information for mobile robot exploration. Because the algorithm discards such a large percentage of the pixels both spatially and temporally, processing occurs at an average of 1000 frames per second, thus freeing the processor for other concurrent tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.