Two hypotheses have been proposed to explain the formation manner for visual working memory (VWM) representations during the consolidation process: an all-or-none process hypothesis and a coarse-to-fine process hypothesis. However, neither the all-or-none process hypothesis nor the coarse-to-fine process hypothesis can stipulate clearly how VWM representations are formed during the consolidation process. In the current study, we propose a two-stage process hypothesis to reconcile these hypotheses. The two-stage process hypothesis suggests that the consolidation of coarse information is an all-or-none process in the early consolidation stage, while the consolidation of detailed information is a coarse-to-fine process in the late consolidation stage. By systematically manipulating the encoding time of memory stimuli, we asked participants to memorize one (Experiment 1) or two (Experiment 2) orientations in different encoding time intervals. We found that the memory rate increased linearly as the encoding time increased. More importantly, VWM precision remained constant when the encoding time was short, while the precision increased linearly as the encoding time increased when the encoding time was sufficient. These results supported the twostage process hypothesis, which reconciles previous conflicting findings in the literature. We need to rely heavily on visual information to meet the needs of serial cognitive tasks 1. The visual stimulus of the external world can be transferred to perception representations. However, perception representation is unstable and susceptible to interference from new information, so it needs to be transformed into another stable form of visual information. This new form of information is visual working memory (VWM, also known as shortterm memory) representation, and the process of forming memory representation is called VWM consolidation 2. Recent studies on the consolidation of VWM have investigated the time course of consolidation 2,3 , the bandwidth of consolidation 4-6 , and the difference in the consolidation mechanisms of various visual features 7-9. For example, by presenting post masks immediately after the disappearance of memory stimuli, researchers manipulated the encoding time of participants for memory stimuli, thereby indirectly controlling the time allowed for VWM consolidation 4-10. However, a consensus has not yet been reached on the formation manner for VWM representations during the consolidation process. Two hypotheses have been proposed: an all-or-none process hypothesis and a coarse-to-fine process hypothesis. The all-or-none hypothesis suggests that, when the perception representation is consolidated to VWM representation, the full representation will be created directly but, if the encoding time is not sufficient, the consolidation process will fail 11. Conversely, the coarse-to-fine hypothesis suggests that the formation of creating VWM representations is a process of transition from rough representations to high-precision representations 12. Previous studies ...