Drivers of automated cars may occasionally need to take back manual control after a period of inattentiveness. At present, it is unknown how long it takes to build up situation awareness of a traffic situation. In this study, 34 participants were presented with animated video clips of traffic situations on a three-lane road, from an egocentric viewpoint on a monitor equipped with eye tracker. Each participant viewed 24 videos of different durations (1, 3, 7, 9, 12, or 20 s). After each video, participants reproduced the end of the video by placing cars in a top-down view, and indicated the relative speeds of the placed cars with respect to the ego-vehicle. Results showed that the longer the video length, the lower the absolute error of the number of placed cars, the lower the total distance error between the placed cars and actual cars, and the lower the geometric difference between the placed cars and the actual cars. These effects appeared to be saturated at video lengths of 7-12 s. The total speed error between placed and actual cars also reduced with video length, but showed no saturation up to 20 s. Glance frequencies to the mirrors decreased with observation time, which is consistent with the notion that participants first estimated the spatial pattern of cars after which they directed their attention to individual cars. In conclusion, observers are able to reproduce the layout of a situation quickly, but the assessment of relative speeds takes 20 s or more.