Multi-dimensional packet classification is one of the most important functions to support various services in next generation routers. Both the memory-efficient data structure to support larger rule tables and the hardware architecture to achieve a higher throughput are desired. In this paper, we propose a parallel and pipelined architecture called Set-Pruning Segment Trees with Buckets (SPSTwB) for multi-dimensional packet classification. SPSTwB significantly reduces rule duplication based on a novel partitioning scheme and an efficient bucket merging scheme. The key feature of our proposed architecture is that memory consumption is reduced significantly regardless of the characteristics of various rule tables. In addition, the logic complexity of each pipeline stage is simplified by not storing rule IDs and priorities and thus it can run at a high clock rate. The proposed scheme needs less than 20 bytes per rule for various 100 K rule tables generated by ClassBench. In addition, the proposed scheme supports fast incremental rule update. The proposed pipelined architecture can achieve a throughput of 134 Gbps from the implementation on Xilinx Virtex-7 FPGA device with dual-ported Block RAM.