Query optimization is considered as one of the main challenges of query processing phases in the cloud environments. The query optimizer attempts to provide the most optimal execution plan by considering the possible query plans. Therefore, the execution cost of a query can be affected by some factors, including communication costs, unavailability of resources, and access to large distributed data sets. In addition, it is known as NP-hard problem and many researchers are focused on this problem in recent years. Some techniques are proposed for solving this problem. Deterministic and non-deterministic methods are two main categories to study these techniques. The deterministic and non-deterministic query optimization methods can be further divided into three subcategories, cost-based query plan enumeration, multiple query optimization, and adaptive query optimization methods. Moreover, this paper presents the advantages and disadvantages of the algorithms for solving the query optimization problems in the cloud environments. Moreover, these techniques are compared in terms of optimization, time, cost, efficiency, and scalability. Finally, some key areas are offered to improve the cloud query optimization mechanisms in the future.
KEYWORDScloud computing, database, query optimization, review
INTRODUCTIONThe data transfer operation and resource sharing are facilitated by rapid progress of the distributed IT-based systems. 1,2 Cloud computing supports several computers through a network. 3 The cloud computing has a large-scale distributed architecture and virtualized services to deliver the requests to users. 4,5 Moreover, the cloud computing provides important financial advantages and long level cooperation possibilities for organizations and institutions. 6 The cloud computing is defined as a distributed IT-based technology based on service business model. 7This paradigm provides many benefits for users, such as the provision of computing capabilities, heterogeneous network access, scalability, and elasticity with measured services. 8,9 The cloud computing gives shared access to a large pool of resources, including data storage, memory, processing, and virtual machines. 10 A cloud client, such as a web browser and mobile app can be helpful in accessing these services. 11 Enormous amounts of data are retrieved from geo-distributed data sources and cross-layer data-handling requirements to make a change in business model. 12 The cloud storage as one of the main services is provided by cloud computing, 13 which allows the users to store their data in virtual pools instead of their servers. 14 In addition, subscribers can access the data from any area of cloud. 15 Therefore, the reliability and availability are necessary to recover the information and query processing.The query processing involves three main steps, as shown in Figure 1. First, the query is translated into an expression of the relational algebra.Second, an optimal evaluation plan for the query plan is generated. The query optimization is the main part o...