Aiming at the bottlenecks of long delay waiting in intra 4x4 prediction, this paper introduces an architecture of computing cells for intra 4´4 luma block prediction. It comprises a SKEW computing cell which employs four 2-level serial adders and two single-level adders, two DC computing cells which employs two 4-input adder and a bypass unit for exporting reference pixel values. The RTL model of our design is implemented with the VerilogHDL, and Synplify synthesis results show that the operation frequency of the intra 4´4 prediction could be up to 162.8MHz on the platform of Altera Stratix III EP3SL150F1152C2N. It totally requires 280 cycles to complete the prediction tasks, with a consumption of 15415 logic elements and 28KB memory bits. Compared with similar designs, our architecture requires less processing cycles and less hardware cost.