Joarder Mohammad Mustafa Kamal scite author profile

Future Generation Computer Systems

Buyya

2016

h i g h l i g h t s• We propose incremental repartitioning of distributed OLTP databases for high-scalability.• We model two incremental repartitioning algorithm and lookup mechanism.• We develop a unique transaction generation model for simulation.• We derive novel impact metrics for distributed transactions.• Simulation results indicate adaptability of the methods scalable OLTP applications. On-line Transaction Processing (OLTP) applications often rely on shared-nothing distributed databases that can sustain rapid growth in data volume. Distributed transactions (DTs) that involve data tuples from multiple geo-distributed servers can adversely impact the performance of such databases, especially when the transactions are short-lived and these require immediate responses. The k-way min-cut graph clustering based database repartitioning algorithms can be used to reduce the number of DTs with acceptable level of load balancing. Web applications, where DT profile changes over time due to dynamically varying workload patterns, frequent database repartitioning is needed to keep up with the change. This paper addresses this emerging challenge by introducing incremental repartitioning. In each repartitioning cycle, DT profile is learnt online and k-way min-cut clustering algorithm is applied on a special sub-graph representing all DTs as well as those non-DTs that have at least one tuple in a DT. The latter ensures that the min-cut algorithm minimally reintroduces new DTs from the non-DTs while maximally transforming existing DTs into non-DTs in the new partitioning. Potential load imbalance risk is mitigated by applying the graph clustering algorithm on the finer logical partitions instead of the servers and relying on random one-to-one cluster-to-partition mapping that naturally balances out loads. Inter-server datamigration due to repartitioning is kept in check with two special mappings favouring the current partition of majority tuples in a cluster-the many-to-one version minimising data migrations alone and the one-toone version reducing data migration without affecting load balancing. A distributed data lookup process, inspired by the roaming protocol in mobile networks, is introduced to efficiently handle data migration without affecting scalability. The effectiveness of the proposed framework is evaluated on realistic TPC-C workloads comprehensively using graph, hypergraph, and compressed hypergraph representations used in the literature. To compare the performance of any incremental repartitioning framework without any bias of the external min-cut algorithm due to graph size variations, a transaction generation model is developed that can maintain a target number of unique transactions in any arbitrary observation window, irrespective of new transaction arrival rate. The overall impact of DTs at any instance is estimated from the exponential moving average of the recurrence period of unique transactions to avoid transient fluctuations. The effectiveness and adaptability of the proposed incremental repartition...

Workload-Aware Incremental Repartitioning of Shared-Nothing Distributed Databases for Scalable Cloud Applications

Buyya

2014

Abstract-Cloud applications often rely on shared-nothing distributed databases that can sustain rapid growth in data volume. Distributed transactions (DTs) that involve data tuples from multiple geo-distributed servers can adversely impact the performance of such databases, especially when the transactions are short-lived in and require immediate response. The k-way min-cut graph clustering algorithm has been found effective to reduce the number of DTs with acceptable level of load balancing. Benefits of such a static partitioning scheme, however, is short-lived in Cloud applications with dynamically varying workload patterns where DT profile changes over time. This paper addresses this emerging challenge by introducing incremental repartitioning. In each repartitioning cycle, DT profile is learnt online and k-way min-cut clustering algorithm is applied on a special sub-graph representing all DTs as well as those non-DTs that have at least one tuple in a DT. The latter ensures that the min-cut algorithm minimally reintroduces new DTs from the nonDTs while maximally transforming existing DTs into non-DTs in the new partitioning. Potential load imbalance risk is mitigated by applying the graph clustering algorithm on the finer logical partitions instead of the servers and relying on random one-to-one cluster-to-partition mapping that naturally balances out loads. Inter-server data-migration due to repartitioning is kept in check with two special mappings favouring the current partition of majority tuples in a cluster-the many-to-one version minimising data migrations alone and the one-to-one version reducing data migration without affecting load balancing. A distributed data lookup process, inspired by the roaming protocol in mobile networks, is introduced to efficiently handle data migration without affecting scalability. The effectiveness of the proposed framework is evaluated on realistic TPC-C workloads comprehensively using graph, hypergraph, and compressed hypergraph representations used in the literature. Simulation results convincingly support incremental repartitioning against static partitioning.

Distributed Database Management Systems: Architectural Design Choices for the Cloud

2014

Development and Verification of Simulation Model Based on Real MANET Experiments for Transport Layer Protocols (UDP and TCP)

Hasan

Griffiths

et al. 2013

Int. J. Autom. Comput.

There is a lack of appropriate guidelines for realistic user traces, mobility models, routing protocols, considerations of real-life challenges, etc. for general-purpose mobile ad hoc networks (MANET). In this paper, four laptops are used in an open field environment in four scenarios to evaluate the performances of Internet control message protocol (ICMP) based ping and transmission control protocol (TCP) based streaming video applications using optimised link state routing (OLSR) implementation in an IEEE 802.11g wireless network. Corresponding simulations are developed in Network Simulator ns-2 by setting simulation parameters according to the real experiments. Difficulties faced to regenerate real-life scenarios have been discussed and the gaps between reality and simulation are identified. A setup guideline to produce realistic simulation results has been established.

Progressive Data Stream Mining and Transaction Classification for Workload-Aware Incremental Database Repartitioning

Gaber

2014

Minimising the impact of distributed transactions (DTs) in a shared-nothing distributed database is extremely challenging for transactional workloads. With dynamic workload nature and rapid growth in data volume the underlying database requires incremental repartitioning to maintain acceptable level of DTs and data load balance with minimum physical data migrations. In a workload-aware repartitioning scheme transactional workload is modelled as graph or hypergraph, and subsequently perform k-way min-cut clustering guaranteeing minimum edge cuts can reduce the impact of DTs significantly by mapping the workload clusters into logical database partitions. However, without exploring the inherent workload characteristics, the overall processing and computing times for large-scale workload networks increase in polynomial orders. In this paper, a workload-aware incremental database repartitioning technique is proposed, which effectively exploits proactive transaction classification and workload stream mining techniques.Workload batches are modelled in graph, hypergraph, and compressed hypergraph then repartitioned to produce a fresh tuple-to-partition data migration plan for every incremental cycle. Experimental studies in a simulated TPC-C environment demonstrate that the proposed model can be effectively adopted in managing rapid data growth and dynamic workloads, thus progressively reduce the overall processing time required to operate over the workload networks.