Virtual Output Queuing is widely used by highspeed packet switches to overcome head-of-line blocking. This is done by means of matching algorithms. In fixed-length VOQ switches, variable-length IP packets are segmented into fixedlength cells at the inputs. When a cell is transferred to its destination output, it will stay in the reassembly buffer and wait for the other cells of the same packet before the entire packet can depart the system. The delay a packet suffers in the system includes the waiting time in the VOQ, the widely studied cell delay, and the waiting time at the output reassembly buffer, the reassembly delay often ignored in many papers. Among all existing matching algorithms, Maximum Weight Matching (MWM) has the lowest average cell delay. In this paper, we investigate the average packet delay, one of the key performance measure for an input buffered packet switch. A new class of matching algorithms, PDA-MWM, is defined and proved to be stable under all admissible traffic. Three PDA-MWM matching algorithms are studied by simulation. We show that, in order to achieve low packet delay, there is a tradeoff between the cell delay performance and the reassembly delay performance. If both of them are carefully considered, a matching scheme can greatly reduce the packet delay as compared to MWM.