Crossbars switches with input queues are the common building blocks of high-speed networks, while their speed and performance critically depend on their scheduler. In this paper we combine ideas from randomized backlog-aware schedulers, and their round-robin (RR) counterparts, to propose a practical, deterministic crossbar scheduler, that: (i) achieves almost full throughput under the many adverse traffic patterns tested, using just 1 Mbyte buffer memory per input, (ii) provides deterministic delay service guarantees, (iii) yields low delays under both uniform and non-uniform load, and (iv) achieves these performances with a single iteration of an iSLIP-like algorithm. With simple extensions, the proposed crossbar scheduler is shown to distribute the bandwidth of congested links in a fair RR or WRR manner. In order to prove the efficiency of the new scheduling algorithm, we implemented in hardware a 32×32 scheduler, using a novel design for programmable-priority RR arbiters, that is significantly more area-speed efficient than present state-of-theart. The scheduler's ASIC occupies roughly 3 mm 2 , when implemented at 130nm, and gives a new crossbar match every 3.2 ns as needed for above hundred Gb/s line rates, and short packet lengths.