Multicast switches have become indispensible for modern computer networks due to the proliferation of multicast traffic on the Internet. One important issue which greatly affects the performance of a multicast switch is how to reduce data loss caused by blocking during the process of duplication and routing of the packets. This paper proposes a multicast crossbar switch with an inner queue at each cross point. With the proposed architecture, no additional control circuits are needed for performing duplication and self-routing. To reduce data loss, duplicated packets are first stored in the inner queues, then continue to transmit at the beginning of the next time slot. By controlling the interarrival time of two sequential groups of packets, our approach can reduce the rate of data loss to 10 -6 or less. Due to simplicity of the proposed architecture, hardware implementation can be realized with ease while delivering good scalability and stackability. The proposed multicast switch has been implemented and verified on an Altera Stratix II EP2S60F1020 chip. It operates at a clock rate of 80 MHz and uses only eight percent of the available look-up tables.I.