Recent studies have shown that auditory scene analysis involves distributed neural sites below, in, and beyond the auditory cortex (AC). However, it remains unclear what role each site plays and how they interact in the formation and selection of auditory percepts. We addressed this issue through perceptual multistability phenomena, namely, spontaneous perceptual switching in auditory streaming (AS) for a sequence of repeated triplet tones, and perceptual changes for a repeated word, known as verbal transformations (VTs). An event-related fMRI analysis revealed brain activity timelocked to perceptual switching in the cerebellum for AS, in frontal areas for VT, and the AC and thalamus for both. The results suggest that motor-based prediction, produced by neural networks outside the auditory system, plays essential roles in the segmentation of acoustic sequences both in AS and VT. The frequency of perceptual switching was determined by a balance between the activation of two sites, which are proposed to be involved in exploring novel perceptual organization and stabilizing current perceptual organization. The effect of the gene polymorphism of catechol-
O
-methyltransferase (COMT) on individual variations in switching frequency suggests that the balance of exploration and stabilization is modulated by catecholamines such as dopamine and noradrenalin. These mechanisms would support the noteworthy flexibility of auditory scene analysis.