A 9.5 mW 20 Gb/s 40 x 70 /_m2 inductorless 1:4 DEMUX in 90 nm CMOS process is presented. In order to reduce power and area, the DEMUX uses a multi-phase clock architecture that requires a smaller number of latches operating at a slower clock rate than in the conventional tree architecture. To provide low-voltage scalability, the latches operate with a near-tail-to-rail logic swing. It is realized without significant speed penalty by adopting current-sourceless CML-type latches with unconventional settings. It offers a larger noise margin and elimination of logic level converters too. The well-balanced scalable design could possibly broaden the applications of high-speed SerDes in the coming ultralowvoltage many-core era.Index Terms -DEMUX, CMOS, multi-phase clock architecture, near-rail-to-rail logic swing I. INTRODUCTION Applications of high-performance demultiplexers (DEMUXes) [1]-[6] have so far been limited chiefly to those that accept a design trade-off in favor of speed, such as fiberoptic communication systems. In typical optical wavelengthdivision multiplexing (WDM) systems, for example, perwavelength speed of 10 Gb/s or faster is required. The conventional wisdom in gaining or maintaining speed of CML-type circuits, used in [1]-[6], has been to reduce the signal amplitude. A dilemma here is that the noise immunity might have to be traded off. Another problem is that current-mode logic (CML)-type circuits do not go well with the trend of lowering the supply voltage. As regards other factors than the speed, since as many DEMUXes as the wavelength multiplicity are needed in WDM systems, the power and area are also important considerations. We present a 1:4 DEMUX with near-rail-to-rail architecture that could offer a possible solution to the challenges in signal integrity and low-voltage scalability as well as in balancing the speed, power, and area.