We propose the Adaptive Scene Dependent Filter (ASDF) hierarchy for unsupervised learning of image segmentation, which integrates several processing pathways into a flexible, highly dynamic, and real-time capable vision architecture. It is based on forming a combined feature space from basic feature maps like, color, disparity, and pixel position. To guarantee real-time performance, we apply an enhanced vector quantization method to partition this feature space. The learned codebook defines corresponding best-match segments for each prototype and yields an oversegmentation of the object and the surround. The segments are recombined into a final object segmentation mask based on a relevance map, which encodes a coarse bottom-up hypothesis where the object is located in the image. We apply the ASDF hierarchy for preprocessing input images in a feature-based biologically motivated object recognition learning architecture. and show experiments with this real-time vision system running at 6 Hz including the online learning of the segmentation. Because interaction with user is not perfect, the real world system acquires useful views effectively only at about 1.5 Hz, but we show that for training a new object one hundred views taking only one minute of interaction time is sufficient.