A video coding framework for the apron surveillance scene has been proposed in this paper, which aims to improve coding efficiency by eliminating long-term redundancy at the object level. To achieve this goal, this study first develops an existing block-based hybrid video coding framework by exploiting the video redundancy on the object level to perform video coding. Second, an object-library mechanism is designed to collect the representative object images as coding references on larger temporal and spatial scales. Finally, a virtual reference frame, which blends background and foreground references from the object library, is adaptively composited according to the video content to improve the interprediction performance. Preliminary experimental results demonstrate that the proposed method achieves a high BD rate reduction of up to 23.97% in apron surveillance video sequences, compared to the standard high efficiency video coding (HEVC).