Abstract-Buffered crossbar (CICQ) switches have shown a high potential in scaling Internet routers capacity. However, they require expensive on-chip buffers whose cost grows quadratically with the port count. Additionally, similar to traditional crossbars, point-to-point switching mandates the use of long wires to connect inputs to outputs, resulting in non-negligible delays. In this paper, we propose a CICQ switching architecture where the buffered crossbar fabric is designed using a Network on Chip (NoC). Instead of a dedicated buffer for every pair of input-output ports, we use on-chip routers, one for each crosspoint. Our design offers several advantages when compared to traditional CICQs: 1) speedup, because the fabric can operate faster due to the small size of the NoC routers, their distributed arbitration and the short wires connecting them. This is in contrast to single-hop crossbars that use long wires and centralized arbitration. 2) Load balancing, because flows from different input-output port pairs share the same router buffers, contrary to the internal buffers of traditional CICQs that are dedicated to a single input-output pair. 3) Path diversity, allowing traffic from an input port to follow different paths to its destination output port. This results in further load balancing, especially for non-uniform traffic, and provides better fault tolerance in the presence of interconnect failures. We analyzed the performance of our architecture by simulation and presented its performance under wide traffic conditions and switch sizes. We prototyped, in CMOS technology, a 32×32 NoC-based crossbar switch. The implementation results suggest that we can clock the switch at a frequency of 413 MHZ, reaching an aggregate throughput in excess of 10 10 ATM cells per second.