Visual environments are complex. In order to process the complex information provided by visual environments, the visual system adopts strategies to reduce its complexity. One strategy, called visual statistical learning, or VSL, is to extract the statistical regularities from the environment. Another strategy is to use the hierarchical structure of a scene (e.g., the co-occurrence between local and global information). Through a series of experiments, this study investigated whether the utilization of the statistical regularities and the hierarchical structure could work together to reduce the complexity of a scene. In the familiarization phase, the participants were asked to passively view a stream of hierarchical scenes where the shapes were concurrently presented at the local and global levels. At each of the two levels there were temporal regularities among the three shapes, which always appeared in the same order. In the test phase, the participants judged the familiarity between 2 triplets, whose temporal regularities were either preserved or not. We found that the participants extracted the temporal regularities at each of the local and global levels (Experiment 1). The hierarchical structure influenced the ability to extract the temporal regularities (Experiment 2). Specifically, VSL was either enhanced or impaired depending on whether the hierarchical structure was informative or not. In summary, in order to process a complex scene, the visual system flexibly uses statistical regularities and the hierarchical structure of the scene.