Whether face gender perception is processed by encoding holistic (whole) or featural (parts) information is a controversial issue. Although neuroimaging studies have identified brain regions related to face gender perception, the temporal dynamics of this process remain under debate. Here, we identified the mechanism and temporal dynamics of face gender perception. We used stereoscopic depth manipulation to create two conditions: the front and behind condition. In the front condition, facial patches were presented stereoscopically in front of the occluder and participants perceived them as disjoint parts (featural cues). In the behind condition, facial patches were presented stereoscopically behind the occluder and were amodally completed and unified in a coherent face (holistic cues). We performed three behavioral experiments and one electroencephalography experiment, and compared the results of the front and behind conditions. We found faster reaction times (RTs) in the behind condition compared with the front, and observed priming effects and aftereffects only in the behind condition. Moreover, the EEG experiment revealed that face gender perception is processed in the relatively late phase of visual recognition (200–285 ms). Our results indicate that holistic information is critical for face gender perception, and that this process occurs with a relatively late latency.