Abstract-One of the most widely used architectures for packet switches is the crossbar. A special version of a it is the buffered crossbar, where small buffers are associated with the crosspoints. The advantages of this organization, when compared to the unbuffered architecture, is that it needs much simpler and slower scheduling circuits, while it can shape the switched traffic according to a given set of Quality of Service (QoS) criteria in a more efficient way. Furthermore, by supporting variable length packets throughout a buffered crossbar: a) there is no need for segmentation and reassembly circuits, b) no internal speedup is necessary, and c) synchronization between the input and output clock domains is simplified. In this paper we present an architecture, a hardware implementation analysis, and a performance evaluation of such a buffered crossbar. The proposed organization is simple, yet powerful and can be easily implemented using today's technologies. Our evaluation shows that it outperforms most of the existing packet switch architectures, while its hardware cost is kept to a minimum.