No abstract
Video-on-Demand (VoD) is a compelling application, but costly due to the load it places on servers. Peer-to-peer (P2P) techniques hold the potential to reduce centralized costs by sharing data between peers. There are many difficult design issues associated with P2P for VoD. Viewing the problem as designing a large distributed cache, many of the issues can be expressed in terms of caching algorithms.In an earlier paper [6], we studied the performance of GridCast, a P2P VoD system deployed on CERNET. From system traces, we found that departure misses are the major cause of server load. Motivated by this finding, this paper examines how to use replication to decrease departure misses and thereby further reduce server load. This paper proposes and evaluates a framework for lazy replication. Lazy replication postpones replication, trying to make efficient use of bandwidth. In our framework, two predictors are plugged in to create the working replication algorithm. Lazy replication with several predictors is compared with a naïve eager replication algorithm. We find that lazy replication is more efficient than eager replication, even when using two simple predictors. With these two simple predictors, lazy replication can decrease server load by 15% from multivideo caching with only a minor increase in network traffic.
Video-on-Demand (VoD) is a compelling application, but costly. VoD is costly due to the load it places on video source servers. Many have proposed using peer-to-peer (P2P) techniques to shift load from servers to peers. Yet, nobody has implemented and deployed a system to openly and systematically evaluate how these techniques work.This article describes the design, implementation and evaluation of GridCast, a real deployed P2P VoD system. GridCast has been live on CERNET since May of 2006. It provides seek, pause, and play operations, and employs peer sharing to improve system scalability. In peak months, GridCast has served videos to 23,000 unique users. From the first deployment, we have gathered information to understand the system and evaluate how to further improve peer sharing through caching and replication.We first show that GridCast with single video caching (SVC) can decrease load on source servers by an average of 22% from a client-server architecture. We analyze the net effect on system resources and determine that peer upload is largely idle. This leads us to changing the caching algorithm to cache multiple videos (MVC). MVC decreases source load by an average of 51% over the client-server. The improvement is greater as user load increases. This bodes well for peer-assistance at larger scales.A detailed analysis of MVC shows that departure misses become a major issue in a P2P VoD system with caching optimization. Motivated by this observation, we examine how to use replication to eliminate departure misses and further reduce server load. A framework for lazy replication is presented and evaluated in this article. In this framework, two predictors are plugged in to create the working replication algorithm. With these two simple predictors, lazy replication can decrease server load by 15% from MVC with only a minor increase in network traffic.
This paper presents the design, implementation and evaluation of a dataflow system, including a dataflow programming model and a dataflow engine, for coarse-grained distributed data intensive applications. The dataflow programming model provides users with a transparent interface for application programming and execution management in a parallel and distributed computing environment. The dataflow engine dispatches the tasks onto candidate distributed computing resources in the system, and manages failures and load balancing problems in a transparent manner. The system has been implemented over .NET platform and deployed in a Windows Desktop Grid. This paper uses two benchmarks to demonstrate the scalability and fault tolerance properties of our system. IntroductionDue to the growing popularity of networked computing environments and the emergence of multi-core processors, parallel and distributed computing is now required at all levels of application development, from desktops to Internet-scale computing environments, such as Grid [6] and P2P. However, programming on distributed resources, especially for parallel applications, is more difficult than programming on centralized environment. There are many research systems that simplify distributed computing. These include BOINC [3], XtremWeb [5], Alchemi [1], and JNGI [9]. These systems divide a job into a number of independent tasks. Applications that can be parallelized in this way are called "embarrassingly parallel". However many algorithms can not be expressed as independent tasks because of internal data dependencies.The work presented in this paper aims towards supporting advanced applications containing multiple tasks with data dependency relationships. Many resource-intensive applications consist of multiple modules, each of which receives input data, performs computations and generates output. Scientific applications for this nature include genomics [16], simulation [8], data mining
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.