As transistor switching speed improves, synchronizing a global clock increasingly degrades system performance. Therefore, self-timed asynchronous logic becomes potentially faster than synchronous logic. To do so, however, it must exploit the techniques used in fast synchronous designs, including: redundant logic, inverting logic, transistor size optimization, dynamic logic, and phase alignment. Most techniques can be applied equally well to asynchronous logic, indeed phase alignment is easier; but combining dynamic and asynchronous logic is more difficult. We must guarantee minimum refresh intervals, together with race and hazard free operation. This paper describes an initial chip implementation, that combines dynamic and asynchronous logic running at 500MHz in 2µm CMOS. With the addition of transistor size optimization, simulations show the same circuit running in the same technology at 800MHz. I. Introduction Some applications, especially those in telecommunications, need both the density of CMOS and high switching speed [1]. This paper describes an effort to build fast CMOS circuits without significantly losing the density benefit of CMOS, increasing the design complexity, or increasing production costs. Specifically, we propose that asynchronous logic, combined with other fast CMOS logic techniques, can help improve speed by removing the need to synchronize a global clock. Many of the fast CMOS design techniques in this paper are derived from the excellent work at Linkoping University [2] [3]. Recently, they produced a shift-register running at 450 MHz in 3µm CMOS [4]. The aim of this paper is to extend their ideas by removing even the single phase clock they use for synchronization. The resultant asynchronous designs produce speeds comparable with their results in current technologies (minimum transistor sizes over 1µm). We believe, however, that asynchronous logic is likely to become more attractive as CMOS technology allows faster transistors. Asynchronous communication is not new [5]. More recently, Martin has proposed a method of producing designs by compilation [6]; Sutherland proposed a method called micropipelines [7]; and we have described a method called 4-state coding [8]. These methods tend to emphasize the reliability or reduced design complexity of asynchronous designs. This paper concentrates on the potential of asynchronous designs for speed. We propose using acknowledged asynchronous logic for local on-chip communication and unacknowledged asynchronous phase alignment buffers for long distance chip-tochip communication. Although other asynchronous techniques can be used, we concentrate on a redundant variation of 4-state coding that is particularly well suited to fast implementation.