In the literature, techniques such as pipelining and wave-pipelining (WP) are proposed for increasing the operating frequency of a digital circuit. In general, use of pipelining results in higher speed at the cost of increase in the area and clock routing complexity. On the other hand, use of WP results in less clock routing complexity and less area but enables the digital circuit to be operated only at moderate speeds. In this paper, a hybrid wave-pipelining scheme is proposed to get the benefits of both pipelining and WP techniques. Major contributions of this paper are: proposal for the implementation of 2D DWT using lifting scheme by adopting the hybrid wave-pipelining and proposal for the automation of the choice of clock frequency and clock skew between the input and output registers of wave-pipelined circuit using built in self test (BIST) and system-on-chip (SOC) approaches. In the hybrid scheme, different lifting blocks are interconnected using pipelining registers and the individual blocks are implemented using WP. For the purpose of evaluating the superiority of the schemes proposed in this paper, the system for the computation of one level 2D DWT is implemented using the following techniques: pipelining, non-pipelining and hybrid wave-pipelining. The BIST approach is used for the implementation on Xilinx Spartan-II device. The SOC approach is adopted for implementation on Altera and Xilinx field programmable gate arrays (FPGAs) based SOC kits with Nios II or Micro blaze soft-core processors. From the implementation results, it is verified that the hybrid WP circuit is faster than non-pipelined circuit by a factor of 1.25-1.39. The pipelined circuit is in turn faster than the hybrid wave-pipelined circuit by a factor of 1.15-1.38 and this is achieved with the increase in the number of registers by a factor of 1.79-3.15 and increase in the number of LEs by a factor of 1.11-1.65. The soft-core processor based automation scheme has considerably reduced the effort required for the design and testing of the hybrid wavepipelined circuit. The techniques proposed in this paper, are also applicable for ASICs. The optimization schemes proposed in this paper are also applicable for the computation of other image transforms such as DCT, DHT.