Data transfer is now an essential function for science discoveries, particularly within big data environments. To support data transfer for big data science, there is a need for high performance, scalable, end-to-end, and programmable networks that enable science applications to use the network most efficiently. The existing network paradigm that support big data science consists of three major components: terabit networks that provide high network bandwidths, Data Transfer Nodes (DTNs) and Science DMZ architecture that bypasses the performance hotspots in typical campus networks, and on-demand secure circuits/paths reservation systems, such as ESNet OSCARS and Internet2 AL2S, which provides automated, guaranteed bandwidth service in WAN. This network paradigm has proven to be very successful. However, to reach its full potentials, we claim that existing network paradigm for big data science must address three major problems: the last mile problem, the scalability problem, and the programmability problem. To address these problems, we proposed a solution called AmoebaNet. AmoebaNet applies Software Defined Networking (SDN) technology to provide "QoS-guaranteed" network services in campus or local area networks. AmoebaNet complements existing network paradigm for big data science: it allows application to program networks at run-time for optimum performance; and, in conjunction with WAN circuits/paths reservation system such as ESNet OSCARS and Internet2 AL2S; it solves the last mile problem and the scalability problem. • Programmability. This feature enables science applications to program networks at run-time to suit their needs. A powerful and rich
TCP performs poorly in networks with serious packet reordering. Processing reordered packets in the TCP layer is costly and inefficient, involving interaction of the sender and receiver. Motivated by the interrupt coalescing mechanism that delivers packets upward for protocol processing in blocks, we propose a new strategy, Sorting Reordered Packets with Interrupt Coalescing (SRPIC), to reduce packet reordering in the receiver. SRPIC works in the network device driver; it makes use of the interrupt coalescing mechanism to sort the reordered packets belonging to the same TCP stream in a block of packets before delivering them upward; each sorted block is internally ordered. Experiments have proven the effectiveness of SRPIC against forward-path reordering
Lambda Station is an ongoing project of Fermi National Accelerator Laboratory and the California Institute of Technology. The goal of this project is to design, develop and deploy network services for path selection, admission control and flow based forwarding of traffic among dataintensive Grid applications such as are used in High Energy Physics and other communities. Lambda Station deals with the last-mile problem in local area networks, connecting production clusters through a rich array of wide area networks. Selective forwarding of traffic is controlled dynamically at the demand of applications. This paper introduces the motivation of this project, design principles and current status. Integration of Lambda Station client API with the essential Grid middleware such as the dCache/SRM Storage Resource Manager is also described. Finally, the results of applying Lambda Station services to development and production clusters at Fermilab and Caltech over advanced networks such as DOE's UltraScience Net and NSF's UltraLight is covered. PROJECT OVERVIEWThe main goal of Lambda Station project is to design, develop and deploy a network path selection service to interface production storage and computing facilities with advanced research networks. In the future, when corresponding API are available Lambda Station will also take on the task of negotiating with reservation or provisioning systems that may regulate the WAN control planes.Policy based routing (PBR) is used to implement flowspecific routing in the LAN and at the border between LAN and WAN. In the next section of this paper we will discuss how Lambda Station serves the unprecedented demands for data movement by running experiments such as CDF, D0, and BaBar as well as upcoming LHC experiments. From our point of view, available data communication technology will not be able to satisfy these demands simply by increasing bandwidth in LANs and commodity WANs due to technology limitations and high deployment and operational costs. Selective forwarding on per flow basis to alternate network paths is desirable for high impact data while leaving other traffic on regular paths. The ability to selectively forward traffic requires developing a control unit that is able to dynamically reconfigure forwarding of specific flows within local production-use routers on demand of applications. We refer to such a control unit as Lambda Station. If one envisions the optical network paths provided by advanced optical-based research networks as high bandwidth data railways, then Lambda Station would functionally be the railroad terminal that regulates which flows at the local site get directed onto the high bandwidth data railways. Lambda Station coordinates network path availability, scheduling, and setup, directs appropriate forwarding within the local network infrastructure, and provides the application with the necessary information to utilize the high bandwidth path. Having created Lambda Station, we introduce awareness and exploitation of advanced networking into d...
Big Data has emerged as a driving force for scientific discoveries. Large scientific instruments (e.g., colliders, and telescopes) generate exponentially increasing volumes of data. To enable scientific discovery, science data must be collected, indexed, archived, shared, and analyzed, typically in a widely distributed, highly collaborative manner. Data transfer is now an essential function for science discoveries, particularly within big data environments. Although significant improvements have been made in the area of bulk data transfer, the currently available data transfer tools and services can not successfully address the high-performance and time-constraint challenges of data transfer required by extreme-scale science applications for the following reasons: disjoint end-to-end data transfer loops, cross-interference between data transfers, and existing data transfer tools and services are oblivious to user requirements (deadline and QoS requirements). Fermilab has been working on the BigData Express project to address these problems. BigData Express seeks to provide a schedulable, predictable, and highperformance data transfer service for big data science. The BigData Express software is being deployed and evaluated at multiple research institutions, which include UMD, StarLight, FNAL, KISTI, KSTAR, SURFnet, Ciena, and other sites.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.