The detection of object boundaries is a critical first step for many visual processing tasks. Multiple cues (we consider luminance, color, motion and binocular disparity) available in the early visual system may signal object boundaries but little is known about their relative diagnosticity and how to optimally combine them for boundary detection. This study thus aims at understanding how early visual processes inform boundary detection in natural scenes. We collected color binocular video sequences of natural scenes to construct a video database. Each scene was annotated with two full sets of ground-truth contours (one set limited to object boundaries and another set which included all edges). We implemented an integrated computational model of early vision that spans all considered cues, and then assessed their diagnosticity by training machine learning classifiers on individual channels. Color and luminance were found to be most diagnostic while stereo and motion were least. Combining all cues yielded a significant improvement in accuracy beyond that of any cue in isolation. Furthermore, the accuracy of individual cues was found to be a poor predictor of their unique contribution for the combination. This result suggested a complex interaction between cues, which we further quantified using regularization techniques. Our systematic assessment of the accuracy of early vision models for boundary detection together with the resulting annotated video dataset should provide a useful benchmark towards the development of higher-level models of visual processing.
Curve fragments, as opposed to unorganized edge elements, are of interest and use in a large number of applications such as multiview reconstructions, tracking, motionbased segmentation, and object recognition. A large number of contour grouping algorithms have been developed, but progress in this area has been hampered by the fact that current evaluation methodologies are mainly edge-based, thus ignoring how edges are grouped into contour segments. We show that edge-based evaluation schemes work poorly for the comparison of curve fragment maps, motivating two novel developments: (i) the collection of new human ground truth data whose primary representation is contour fragments and where the goal of collection is not distinguished objects but curves evident in image data, and (ii) a methodology for comparing two sets of curve fragments which takes into account the instabilities inherent in the formation of curve fragments. The approach compares two curve fragment sets by exploring deformation of one onto another while traversing discontinuous transitions. The geodesic paths in this space represent the best matching between the two sets of contour fragment. This approach is used to compare the results of edge linkers on the new contour fragment human ground truth.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.