A major challenge for global query optimization in a multidatabase system (MDBS) is lack of local cost information at the global level due to local autonomy. A number of methods to derive local cost models have been suggested recently. However, these methods are only suitable for a static multidatabase environment. In this paper, we propose a new multi-states query sampling method to develop local cost models for a dynamic environment. The system contention level at a dynamic local site is divided into a number of discrete contention states based on the costs of a probing query. To determine an appropriate set of contention states for a dynamic environment, two algorithms based on iterative uniform partition and data clustering, respectively, are introduced. A qualitative variable is used to indicate the contention states for the dynamic environment. The techniques from our previous (static) query sampling method, including query sampling, automatic variable selection, regression analysis, and model validation, are extended so as to develop a cost model incorporating the qualitative variable for a dynamic environment. Experimental results demonstrate that this new multi-states query sampling method is quite promising in developing useful cost models for a dynamic multidatabase environment.
Accurate query cost estimation is crucial to query optimization in a multidatabase system. Several estimation techniques for a static environment have been suggested in the literature. To develop a cost model for a dynamic environment, we recently introduced a multistate query-sampling method. It has been shown that this technique is promising in estimating the cost of a query run in any given contention state for a dynamic environment. In this paper, we study a new problem on how to estimate the cost of a large query that may experience multiple contention states. Following the discussion of limitations for two simple approaches, i.e., single state analysis and average cost analysis, we propose two novel techniques to tackle this challenge. The first one, called fractional analysis, is suitable for a gradually and smoothly changing environment, while the second one, called the probabilistic approach, is developed for a rapidly and randomly changing environment. The former estimates a query cost by analyzing its fractions, and the latter estimates a query cost based on Markov chain theory. The related issues including cost formula development, error analysis, and comparison among different approaches are discussed. Experiments demonstrate that the proposed techniques are quite promising in solving the new problem.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.