Programming network processors is challenging. To sustain high line rates, network processors have extremely tight memory access and instruction budgets. Achieving desired performance has traditionally required hand-coded assembly. Researchers have recently proposed high-level programming languages for packet processing, but the challenges of compiling these languages into code that is competitive with hand-tuned assembly remain unanswered.This paper describes the Shangri-La compiler, which accepts a packet program written in a C-like high-level language and applies scalar and specialized optimizations to generate a highly optimized binary. Hot code paths identified by profiling are mapped across processing elements to maximize processor utilization. Since our compilation target has no hardware caches, software-controlled caches are generated for frequently accessed application data structures. Packet handling optimizations significantly reduce perpacket memory access and instruction counts. Finally, a custom stack model maps stack frames to the fastest levels of the target processor's heterogeneous memory hierarchy.Binaries generated by the compiler were evaluated on the Intel IXP2400 network processor with eight packet processing cores and eight threads per core. Our results show the importance of both traditional and specialized optimization techniques for achieving the maximum forwarding rates on three network applications, L3-Switch, MPLS and Firewall. , program partitioning, dataflow programming.hand-coded assembly. Networks have mostly relied on the widespread use of a few core packet programs. To achieve high line rates, though, a network program usually has very tight memory access and instruction count budgets. In the past, careful handoptimization of assembly code has been the most effective means of achieving the required performance given the small kernels and difficult resource constraints.As networks have evolved, so has the code running them, becoming larger, more diverse and complex. More and more network protocols are being developed for specialized applications (e.g. wireless, VoIP, proxies, network-attached storage). Enhancements to base protocols have been implemented to satisfy load balancing, security and reliability requirements. Shipped network hardware must operate correctly in an increasing number of different configurations.Hand-coded assembly has become a hindrance to developing new network applications and updating existing applications. Reusing common routines written in assembly in different contexts is difficult, debugging assembly code is extremely tedious and maintaining assembly code is a time-consuming effort. Even stateof-the-art tools that abstract some of the assembly programming details still expose to programmers the multi-threaded packet processing cores, heterogeneous memories and custom communication topologies found on network processors.Shangri-La, which consists of a programming language, a compiler and a runtime system, simplifies development and accelerates perfor...