SummaryNoSQL distributed databases are often used as Big Data platforms. To provide efficient resource sharing and cost effectiveness, such distributed databases typically run concurrently on a virtualized infrastructure that could be implemented using hypervisor‐based virtualization or container‐based virtualization. Hypervisor‐based virtualization is a mature technology but imposes overhead on CPU, networking, and disk. Recently, by sharing the operating system resources and simplifying the deployment of applications, container‐based virtualization is getting more popular. This article presents a performance comparison between multiple instances of VMware VMs and Docker containers running concurrently. Our workload models a real‐world Big Data Apache Cassandra application from Ericsson. As a baseline, we evaluated the performance of Cassandra when running on the nonvirtualized physical infrastructure. Our study shows that Docker has lower overhead compared with VMware; the performance on the container‐based infrastructure was as good as on the nonvirtualized. Our performance evaluations also show that running multiple instances of a Cassandra database concurrently affected the performance of read and write operations differently; for both VMware and Docker, the maximum number of read operations was reduced when we ran several instances concurrently, whereas the maximum number of write operations increased when we ran instances concurrently.
Apache Cassandra is an highly scalable and available NoSql datastore, largely used by enterprises of each size and for application areas that range from entertainment to big data analytics. Managed Cassandra service providers are emerging to hide the complexity of the installation, fine tuning and operation of Cassandra virtual data centers (VDCs). This paper address the problem of energy efficient autoscaling of Cassandra VDC in managed Cassandra data centers. We propose three energy-aware autoscaling algorithms: Opt, LocalOpt and LocalOpt-H. The first provides the optimal scaling decision orchestrating horizontal and vertical scaling and optimal placement. The other two are heuristics and provide sub-optimal solutions. Both orchestrate horizontal scaling and optimal placement. LocalOpt consider also vertical scaling. In this paper: we provide an analysis of the computational complexity of the optimal and of the heuristic auto-scaling algorithms; we discuss the issues in auto-scaling Cassandra VDC and we provide best practice for using auto-scaling algorithms; we evaluate the performance of the proposed algorithms under programmed SLA variation, surge of throughput (unexpected) and failures of physical nodes. We also compare the performance of energyaware auto-scaling algorithms with the performance of two energy-blind auto-scaling algorithms, namely BestFit and BestFit-H. at reducing the energy consumption or resource usage in general can heavily reduce the reliability of Cassandra in term of the consistency level offered. Horizontal scaling of Cassandra is very slow and make hard to manage surge of throughput. Vertical scaling is a valid alternative, but it is not supported by all the cloud infrastructures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.