Visual statistical learning (VSL) has been proposed as a powerful mechanism underlying the striking ability of human observers to handle complex visual environments. Previous studies have shown that VSL can occur when statistical information is embedded at multiple levels of abstraction, such as at semantically different category levels. In the present study, we further examined whether statistical regularities at a basic category level (e.g., a regular sequence of a bird, then a car, and then a dog) could influence the ability to extract statistical regularities at the subordinate level (e.g., a regular sequence of a parrot, then a sports car, and then an Eskimo dog). In the familiarization phase, participants were exposed to a stream of real-world images whose semantic categories had temporal regularities. Importantly, the temporal regularities existed at both the basic and subordinate levels, or the regularities existed at only the subordinate level, depending on the experimental condition. After completing the familiarization, participants performed a surprise two-alternative forced choice (2AFC) task for a familiarity judgment between two triplets in which the temporal regularities were either preserved or not preserved. Our results showed that the existence of statistical regularities at the basic level did not influence VSL at the subordinate level. The subsequent experiments showed these results consistently even when the basic-level categories had to be explicitly recognized and when the stimuli were not easily categorized at their subordinate level. Our results suggest that VSL is constrained to learn a particular level of patterns when patterns are presented across multiple levels.