This paper presents both a retrospective of the development of network interface architecture, and performance and conformance data from a range of contemporary devices sporting various performance enhancing technologies. The data shows that 10Gb/s networking is now possible without statefull offload and while consuming less than one CPU core on a contemporary commodity server.
INTRODUCTIONFrom ARPANET [6] and Ethernet [30] [33,21]. These developments to the mid90s have come to represent the roots of today's main-stream LAN interface designs. However, while is the case that offloads are now regarded as commodity items, the desirability and utility of even the most simple offload should remain under debate. For example Stone and Partridge [35] describe a study of the root cause of network errors which escape the Ethernet FCS check and find many systematic errors in hardware and software.One architecture which received attention from the late80s was that of executing the transport protocol on the host network interface. Implementations include the XTP Protocol Engine [8], the Nectar communications processor [14], and the VMP adaptor [23]. Follow on work [22,33] based on the Protocol Engine architecture implemented TCP/IP on the host interface for a 622Mb/s ATM network. This TCP/IP offload architecture was rejected [10,24] and has not subsequently been taken up to any significant degree by the academic community except recently as a means to an end for the support of Remote Direct Memory Access (RDMA)protocols [31].Another architectural choice which is periodically revisited is whether to perform protocol processing in user or kernel space. Druschel and Davie implemented [13] an interface which allowed user-space programs direct access to an ATM adaptor and Thekkath describes [37] an implementation of a user level TCP/IP stack over Mach. As well as performance, this work was concerned with the issues of multi-protocol co-existence and efficient operation in a micro-kernel environment. By contrast, the Jetstream/Afterburner adaptor [15] was used to implement TCP/IP in user space [15] over a monolithic kernel, as did Pratt using a firmware modified Gigabit Ethernet NIC [32]. These were essentially ports of their respective kernel protocol stacks to user level over a protected hardware interface. Hybrid models have also been proposed, see [28] for a survey and perspective. However achieving good all-round performance has proven to be elusive without a protocol stack implementation which has been expressly designed for user level operation.Over the same late-80s to mid-90s period, there was considerable parallel activity with multicomputer network interface architecture. The ATOMIC project[11] utilised components from the Mosaic multicomputer to construct a Gb/s LAN (later commercialised [3]). The multicomputer environment was somewhat less constrained than that of the ATM or distributed systems environments, application behaviour suited low-overhead user-level abstractions of communication, and these abstra...