Most vision-based approaches to mobile robotics suffer from the limitations imposed by stereo obstacle detection, which is short-range and prone to failure. We present a self-supervised learning process for long-range vision that is able to accurately classify complex terrain at distances up to the horizon, thus allowing superior strategic planning. The success of the learning process is due to the self-supervised training data that is generated on every frame: robust, visually consistent labels from a stereo module, normalized wide-context input windows, and a discriminative and concise feature representation. A deep hierarchical network is trained to extract informative and meaningful features from an input image, and the features are used to train a realtime classifier to predict traversability. The trained classifier sees obstacles and paths from 5 to over 100 meters, far beyond the maximum stereo range of 12 meters, and adapts very quickly to new environments. The process was developed and tested on the LAGR mobile robot. Results from a ground truth dataset are given as well as field test results.
BackgroundThere has been considerable work on tracking systems, for example, see [11] [9]. Our system draws ideas from these and other earlier work. While many of the basic ideas are similar, the details are often quite different, and are what account for the systems unique abilities.Some of the major differences stem from our area of application. Our goal is to track targets in a perimeter security type setting, i.e. outdoor operation in area of moderate to high cover. We seek real-time algorithms suitable for COTS (Common-Off-The-Sheff) type of computing, and use x86 based processors. This domain of application significantly restricts the techniques that can be applied. Some of the conThis work supported in part by DARPA Image Understanding's VSAM program. straints, and their implications for our systems include:The lighting is naturally varying. We must handle sunlight filtered through trees and intermittent cloud cover. (We are not considering IR cameras, yet).Targets use camouflage, thus it is unlikely that color will add much information. Figure 3 shows an example scene with a sniper in the grass.Targets will be moving in areas with large amounts of occlusion; finding/classifying outlines will be difficult.Trees/brush/clouds all move. The system must have algorithms to help distinguish these "insignificant" motions from target motions.Many targets will move slowly (less than [1/ 60] pixel per frame); some will move even more slowly. Some will try very hard to blend into the motion of the trees/brush. Therefore frame-to-frame differencing is of limited value. Temporal adaption schemes must not add slow targets to the background.Targets will not, in general, be "upright" or isolated. Thus we have not added "labeling" of targets based on simple shape/scale/orientation models.Targets need to be detected quickly and when they are still very small and distant, e.g. about 10-20 pixels on target.Correlation, template matching, and related techniques cannot be effectively used because of large amounts of occlusion and because in a paraimage, image translation is a very poor model; objects translating in the world undergo rotation and non-linear scaling.Note that, except for the last, these are all generic problem constraints and are not dependent on the geometry of the paraimage. If a system can track under these constraints it can be used in many situations, not just omni-directional tracking in outdoor settings.We also note that, the detection phase is crucial; if targets are not detected they will not be tracked. Detection is also an area where the domain constraints make this more difficult than the situtations considered in most past papers. As a result, much of this paper (and the systems effort) is concentrated on the detection phase. Because of the camouflage and 1
Abstract-We present a learning process for long-range vision that is able to accurately classify complex terrain at distances up to the horizon, thus allowing high-level strategic planning. A deep belief network is trained to extract informative and meaningful features from an input image, and the features are used to train a realtime classifier to predict traversability. A hyperbolic polar coordinate map is used to accumulate the terrain predictions of the classifier. The process was developed and tested on the LAGR mobile robot.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.