As the main factor in the distributed database systems, query optimization is aimed at finding an optimal execution plan to reduce the runtime. In such systems, because of the repeated relations on various sites, the query optimization is very challenging. Moreover, the query optimization issue with large-scale distributed databases is an NP-hard problem. Therefore, in this paper, an Artificial Bee Colony Algorithm based on Genetic Operators (ABC-GO) is proposed to find a solution to join the query optimization problems in the distributed database systems. The ABC algorithm has the global-local search capabilities and genetic operators to create new candidate solutions for improving the performance of the ABC algorithm. The obtained results have shown that the cost of the query evaluation is minimized and the quality of Top-K query plans is improved for a given distributed query. Moreover, this method decreases the overhead.However, it needs a longer execution time.
KEYWORDSartificial bee colony optimization, distributed database, genetic operator, query plan optimization
INTRODUCTIONDue to the rapid growth of data in current systems, 1,2 a set of the interrelated databases, known as a distributed database, is stored on multiple computers to improve performance, reliability, accessibility, and modularity in such cases compared to the conventional centralized database system. 3-7 In today's world, every user of the system, whether an employee or a customer, needs access to the company's databases. The data in distributed databases may be repeated at different sites according to a distributed allocation plan. 8-10 As a result, an association among different sites is needed to create answers to a user query. 11 The query optimization is a critical subject for the distributed database management system. Therefore, the poor performance of database for answering the query is the result of poor selection of a query execution plan. 12The query is a statement or a group of statements that performs some simple database operations such as deletion, updating, and retrieving data. It plays a critical role in data management and retrieval. Commonly, the distributed queries are more complicated than centralized queries due to data distribution over different sites. 13 Consequently, the impact of efficient query processing is increasing in a large number of applications.A common method for identifying the Top-k objects is that all database objects are scored on some function. Therefore, the DQP (distributed query process) strategy aims to generate the query processing plans that reduce the amount of data transmission between sites and, as a result, reduces the response time of the distributed query. The query optimization is a critical subject for database management systems, 14 whose purpose is to define an appropriate execution plan for the user's query. 15 The cost of each plan should be produced due to more than one plan for such a query, which is significantly dependent on the amount of participation and data transfer betwe...