To investigate scene segmentation in the visual system we present a model of two reciprocally connected visual areas using spiking neurons. Area P corresponds to the orientation-selective subsystem of the primary visual cortex, while the central visual area C is modeled as associative memory representing stimulus objects according to Hebbian learning. Without feedback from area C, a single stimulus results in relatively slow and irregular activity, synchronized only for neighboring patches (slow state), while in the complete model activity is faster with an enlarged synchronization range (fast state). When presenting a superposition of several stimulus objects, scene segmentation happens on a time scale of hundreds of milliseconds by alternating epochs of the slow and fast states, where neurons representing the same object are simultaneously in the fast state. Correlation analysis reveals synchronization on different time scales as found in experiments (designated as tower, castle, and hill peaks). On the fast time scale (tower peaks, gamma frequency range), recordings from two sites coding either different or the same object lead to correlograms that are either flat or exhibit oscillatory modulations with a central peak. This is in agreement with experimental findings, whereas standard phase-coding models would predict shifted peaks in the case of different objects.