It is well known that the performance of quicksort can be improved by selecting the median of a sample of elements as the pivot of each partitioning stage. For large samples the partitions are better, but the amount of additional comparisons and exchanges to find the median of the sample also increases. We show in this paper that the optimal sample size to minimize the average total cost of quicksort, as a function of the size n of the current subarray size, is a • √ n + o(√ n). We give a closed expression for a, which depends on the selection algorithm and the costs of elementary comparisons and exchanges. Moreover, we show that selecting the medians of the samples as pivots is not the best strategy when exchanges are much more expensive than comparisons. We also apply the same ideas and techniques to the analysis of quickselect and get similar results.
In a randomly grown binary search tree BST of size n, any fixed pattern occurs with a frequency that is on average proportional to n. Deviations from the average case are highly unlikely and well quantified by a Gaussian law. Trees with forbidden patterns occur with an exponentially small probability that is characterized in terms of Bessel functions. The results obtained extend to BSTs a type of property otherwise known for strings and combinatorial tree models. They apply to paged trees or to quicksort with halting on short subfiles. As a consequence, various pointer saving strategies for maintaining trees obeying the random BST model can be precisely quantified. The methods used are based on analytic models, especially bivariate generating function subjected to singularity perturba-Ž .
In this paper, we present randomized algorithms over binary search trees such that: (a) the insertion of a set of keys, in any fixed order, into an initially empty tree always produces a random binary search tree; (b) the deletion of any key from a random binary search tree results in a random binary search tree; (c) the random choices made by the algorithms are based upon the sizes of the subtrees of the tree; this implies that we can support accesses by rank without additional storage requirements or modification of the data structures; and (d) the cost of any elementary operation, measured as the number of visited nodes, is the same as the expected cost of its standard deterministic counterpart; hence, all search and update operations have guaranteed expected cost O(log
n
), but now irrespective of any assumption on the input distribution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.