Neighborhood image processing operations on Field Programmable Gate Array (FPGA) are considered as memory intensive operations. Large memory bandwidth is required to transfer the required pixel data from external memory to the processing unit. On-chip image buffers are employed to reduce this data transfer rate. Conventional image buffers, implemented either by using FPGA logic resources or embedded memories are resource inefficient. They exhaust the limited FPGA resources quickly. Consequently, hardware implementation of neighborhood operations becomes expensive, and integrating them in resource constrained devices becomes unfeasible. This paper presents a resource efficient FPGA based on-chip buffer architecture. The proposed architecture utilizes full capacity of a single Xilinx Block-RAM (BRAM36 primitive) for storing multiple rows of input image. To get multiple pixels/clock in a user defined scan order, an efficient duty-cycle based memory accessing technique is coupled with a customized addressing circuitry. This accessing technique exploits switching capabilities of BRAM to read 4 pixels in a single clock cycle without degrading system frequency. The addressing circuitry provides multiple pixels/clock in any user defined scan order to implement a wide range of neighborhood operations. With the saving of 83% BRAM resources, the buffer architecture operates at 278 MHz on Xilinx Artix-7 FPGA with an efficiency of 1.3 clock/pixel. It is thus capable to fulfill real time image processing requirements for HD image resolution (1080×1920) @103 fps.