Users of cloud computing technology can lease resources instead of spending an excessive charge for their ownership. For service delivery in the infrastructure-as-a-service model of the cloud computing paradigm, virtual machines (VMs) are created by the hypervisor. This software is installed on a bare-metal server, called the host, and acted as a broker between the hardware of the host and its VMs. The host is responsible for the allocation of required resources, such as CPU, RAM and network bandwidth, for VMs. Therefore, allocating resources to a VM is equivalent to finding the location of the VM on the hosts. In this paper, we propose a model for resource allocation of a datacenter that includes clusters of hosts. This model is based on the birth–death process of queueing systems and continuous-time Markov chains. We will focus on RAM-intensive VMs and consider the allocation of RAM for a VM as a job in the queueing systems. The purpose of this modeling is to keep the number of running hosts minimum while guaranteeing the quality of service in terms of response. When the utilization of active hosts reaches a predefined threshold value, a new host is added to prevent response time violation, and when host utilization is reduced to a certain threshold, one of the hosts can be deactivated. The experimental results show that, in the long run, the odds of working with more jobs are increased.