This paper presents detailed measurements of processing overheads for the Ultrix 4.2a implementation of TCP/IP network software running on a DECstation 5000/200. The performance results were used to unco ver throughput and latency bottlenecks. We present a scheme for impro ving throughput when sending large messages by a voiding most checksum computations in a relatively safe manner. We also show that for the implementation we studied, reducing latency (when sending small messages) is a more dif ficult problem because processing overheads are spread over many operations; gaining a significant savings would require the optimization of many different mechanisms. This is especially important because, when processing a realistic w orkload, we ha ve found that non-data-touching operations consume more time in aggre gate than data-touching operations.
IntroductionWe analyze TCP/IP [30] and UDP/IP [29] processing overheads given a real w orkload on a DECstation 5000/200 running Ultrix 4.2a, and we use this information to guide our development of new optimizations. The cost of various processing overheads depend on message size; consequently, our optimizations take into account the message size distributions derived from the network traffic in our environment (which is not atypical of many academic and office environments).In our analysis of the TCP/IP and UDP/IP LAN and WAN traffic we were able to collect, we fi nd that message sizes are f ar from uniformly distributed; rather, most messages are either very small or very large. Small messages are usually used to carry control information, whereas lar ge messages typically carry bulk data. Different kinds of optimizations can improve processing speed for each type of traffic; in this paper we discuss both.Typical processing time breakdowns for short (i.e. 64-128 byte) control messages fundamentally differ from those of long multiple kilobyte data messages. The processing time of large messages is dominated by data-touching operations such as cop ying and computing checksums [4, 8-11, 15, 24, 36] because these operations must be applied to each byte. However, small messages have few bytes of data, and thus their processing time is dominated by non-data-touching operations.To optimize processing of large messages, we describe a checksum redundancy a voidance algorithm that eliminates most checksum processing without sacrificing reliability. Since checksum processing alone consumes nearly half the total processing time (of large messages), this optimization improves throughput considerably.On both the LAN and WAN we studied, both of which are typical Unix-networking environments, small messages far outnumber large messages. In fact, even though processing a large message requires more time, the large proportion of small messages causes the cumulative non-data-touching processing time to exceed the cumulative data-touching processing time. We show that it w ould be difficult to significantly reduce the a verage processing time of non-datatouching overheads, at lea...