The High Performance Fortran (HPF) benchmark suite HPFBench is designed for evaluating the HPF language and compilers on scalable architectures. The functionality of the benchmarks covers scientific software library functions and application kernels that reflect the computational structure and communication patterns in fluid dynamic simulations, fundamental physics, and molecular studies in chemistry and biology. The benchmarks are characterized in terms of FLOP count, memory usage, communication pattern, local memory accesses, array allocation mechanism, as well as operation and communication counts per iteration. The benchmarks output performance evaluation metrics in the form of elapsed times, FLOP rates, and communication time breakdowns. We also provide a benchmark guide to aid the choice of subsets of the benchmarks for evaluating particular aspects of an HPF compiler. Furthermore, we report an evaluation of an industry-leading HPF compiler from the Portland Group Inc. using the HPFBench benchmarks on the distributed-memory IBM SP2.
Abstract. Transferring active networking technology from the research arena to everyday deployment on desktop and edge router nodes, requires a NodeOS design that simultaneously meets three goals: (1) be embedded within a wide-spread, open source operating system; (2) allow non-active applications and regular operating system operation to proceed in a regular manner, unhindered by the active networking component; (3) offer performance competitive with that of networking stacks of general purpose operating systems. Previous NodeOS systems, Bowman, Janos, AMP and Scout, only partially addressed these goals. Our contribution lies in the design and implementation of such a system, a NodeOS within the Linux kernel, and the demonstration of competitive performance for medium and larger packet sizes. We also illustrate how such a design easily renders to the deployment of other networking architectures, such as peer-to-peer networks and extensible routers.
This paper describes our effort to build extensible routers using a combination of general-purpose and network processors. We emphasize five overriding challenges that dictate our design decisions: (1) optimal resource allocation; (2) efficient but flexible scheduling of the CPU; (3) maintaining overall router robustness; (4) maximizing router performance; and (5) providing sufficient extensibility to enable the injection of new functionality into the router. We adopt a hierarchical architecture, in which packet flows traverse a range of processing/forwarding paths, thereby partitioning hardware and software in concert. This paper both presents the architecture, and describes our experiences implementing the architecture and addressing the five design challenges in a prototype built from Intel IXP 1200 and a Pentium. 1156 N. SHALABY ET AL.programmed to filter packets, translate addresses, make level-n routing decisions, broker quality of service (QoS) reservations, thin data streams, run proxies, support computationally-weak home electronic devices, serve as the front-end to scalable clusters, support application-specific overlay networks, and even dynamically inject application-specific forwarding code, an effort researched by active networks [1][2][3][4].In response to the pressure of these trends, in order to enable the myriad of services migrating into the network, we extend the notion of a router to include host and server functionality in a flexible way-coining the term extensible router. In addition to supporting extensibility, we have adopted a strategy that employs Commercial off the Shelf (COTS) hardware rather than a custom design-using a hybrid of the newly emerging class of network processors and PCs [5].Network processors are designed to operate under demanding performance requirements and commonly employ parallelism to hide memory latency. They should, therefore, be responsible for forwarding minimal processing packets at line speed, accounting for the bulk of the packets-the data plane. PCs on the other hand, are cheap commodities, with powerful CPUs, which can be employed to process more computationally intensive packets received much less often-the control plane. Nowadays however, the distinction between data and control plane packets is often blurred, with data plane packets requiring more processing, such as the evaluation of firewall rules, or a larger number of control plane packets are received in a single stream, such as the injection of forwarding code into the network (the Active Networks initiative). This fact, coupled with our hybrid hardware, has resulted in a router design with a multi-level processor and execution hierarchy. The hierarchical framework spans all the paths a packet traverses, from the ports, across the processor hierarchy of a network processor, through the kernel space of a general-purpose processor, all the way to the middlebox services and user applications.The main contribution of this paper is to conceptually partition an extensible router into a hierarchy of exe...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.