Failures are normal rather than exceptional in the cloud computing environments. To improve system availability, replicating the popular data to multiple suitable locations is an advisable choice, as users can access the data from a nearby site. This is, however, not the case for replicas which must have a fixed number of copies on several locations. How to decide a reasonable number and right locations for replicas has become a challenge in the cloud computing. In this paper, a dynamic data replication strategy is put forward with a brief survey of replication strategy suitable for distributed computing environments. It includes: 1) analyzing and modeling the relationship between system availability and the number of replicas; 2) evaluating and identifying the popular data and triggering a replication operation when the popularity data passes a dynamic threshold; 3) calculating a suitable number of copies to meet a reasonable system byte effective rate requirement and placing replicas among data nodes in a balanced way; 4) designing the dynamic data replication algorithm in a cloud. Experimental results demonstrate the efficiency and effectiveness of the improved system brought by the proposed strategy in a cloud.
Prediction is one of the most attractive aspects in data mining. Link prediction has recently attracted the attention of many researchers as an effective technique to be used in graph based models in general and in particular for social network analysis due to the recent popularity of the field. Link prediction helps to understand associations between nodes in social communities. Existing link prediction-related approaches described in the literature are limited to predict links that are anticipated to exist in the future. To the best of our knowledge, none of the previous works in this area has explored the prediction of links that could disappear in the future. We argue that the latter set of links are important to know about; they are at least equally important as and do complement the positive link prediction process in order to plan better for the future. In this paper, we propose a link prediction model which is capable of predicting both links that might exist and links that may disappear in the future. The model has been successfully applied in two different though very related domains, namely health care and gene expression networks. The former application concentrates on physicians and their interactions while the second application covers genes and their interactions. We have tested our model using different classifiers and the reported results are encouraging. Finally, we compare our approach with the internal links approach and we reached the conclusion that our approach performs very well in both bipartite and non-bipartite graphs.
Online scheduling plays a key role for big data streaming applications in a big data stream computing environment, as the arrival rate of high velocity continuous data stream might fluctuate over time. In this paper, an elastic online scheduling framework for big data streaming applications (E-Stream) is proposed, exhibiting the following features. (1) Profile mathematical relationships between system response time, multiple application fairness, and online features of high velocity continuous stream. (2) Scale out or scale in a data stream graph by quantifying computation and communication cost, and the vertex semantics for arrival rate of data stream, and adjust the degree of parallelism of vertices in the graph. Sub-graph is further constructed to minimize data dependencies among the sub-graphs. (3) Elastically schedule a graph by a priority based earliest finish time first online scheduling strategy, and schedule multiple graphs by a max-min fairness strategy. (4) Evaluate the low system response time and acceptable applications fairness objectives in a realworld big data stream computing environment. Experimental results conclusively demonstrate that the proposed E-Stream provides better system response time and applications fairness compared to the existing Storm framework.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.