Displacement plays a crucial role in structural health monitoring, but the accurate measurement of structural displacement remains a challenging task. Nowadays, some researchers attempt to estimate structural displacement by fusing vision camera and accelerometer measurements. Considering hardware limitations and computational costs, vision measurements are commonly performed at a low sampling rate. Nevertheless, the use of a low sampling rate may cause temporal aliasing in vision measurements, which can cause large displacement errors. In this study, we propose a finite impulse response (FIR) filter-based technique to estimate structural displacement using high-sampling acceleration measurement and low-sampling vision measurement with temporal aliasing. By explicitly eliminating the error induced by temporal aliasing, the displacement estimation accuracy can be significantly improved compared to existing FIR filter-based techniques. The proposed technique was experimentally validated on a single-story building model, and the results show that the displacement estimation performance of the technique was insensitive to the sampling rate of vision measurements. Structural displacement was accurately estimated even when temporal aliasing was present in vision measurements.