Quantum computers are based on quantum physics, and can exploit quantum effects to boost its computation. However, analysis of quantum algorithms are often hard since quantum algorithms are, in some sense, parallel algorithms (called quantum parallelism) and its behavior is unintuitive. In order to analyze the performance of quantum algorithms, simulation can be a good approach. Especially, for quantum heuristic algorithms, analysis based on simulation is the only approach. However, simulation of quantum algorithms is a computationally demanding task since it needs exponential size of memory and frequent read/write access to the memory. Thus, it is important to develop memory efficient simulation algorithms and architectures. In this paper, we propose a fast hardware simulator architecture for the Walsh-Hadamard transform since the Walsh-Hadamard transform is a core of many quantum algorithms including quantum heuristic algorithms. We developed a method to divide the whole computation of the Walsh-Hadamard transform into pieces and process them in a pipelined manner. By arranging data flow and using well designed address computation, it runs without a pipeline stall. The proposed method is also efficient in memory size.