Along with the unbounded speedup and exponential growth of virtual queues requirement aiming for 100% throughput of multicast scheduling as the size of the high-speed switches scale, the issues of low throughput of multicast under non-speedup or fixed crosspoint buffer size is addressed. Inspired by the load balance two-stage Birkhoff-von Neumann architecture that can provide 100% throughput for all kinds of unicast traffic, a novel 3-stage architecture, consisting of the first stage for multicast fan-out splitting, the second stage for load balancing, and the last stage for switching (FSLBS) is proposed. And the dedicated multicast fan-out splitting to unicast (M2U) scheduling algorithm is developed for the first stage, while the scheduling algorithms in the last two stages adopt the periodic permutation matrix. FSLBS can achieve 100% throughput for integrated uni-and multicast traffic without speedup employing the dedicated M2U and periodic permutation matrix scheduling algorithm. The operation is theoretically validated adopting the fluid model.