The paper presents a new architecture for implementation of an H.264/MPEG-4 AVC deblocking filter in hardware. This architecture adopts an innovative 4-stage pipelined structure in the edge filter. The proposed approach redesigns internal filter datapaths and memory organization in order to reduce both global processing cycles per macroblock and chip area. A comparison with other previous related works indicated that the proposed architecture offers significant gains for practical implementation in embedded systems, which commonly have restrictions in power consumption, clock speed and memory capability.