Summary
The WAND processing strategy is a dynamic pruning algorithm designed for large scale Web search engines where fast response to queries is a critical service. The WAND is used to reduce the amount of computation by scoring only documents that may become part of the top‐k document results. In this paper, we present two parallel strategies for the WAND algorithm and compare their performance on GPUs. In our first strategy (named size‐based), the posting lists are evenly partitioned among thread blocks. Our second strategy (named range‐based) partitions the posting lists according to document identifier intervals; thus, partitions may have different sizes. We also propose three threshold sharing policies, named Local, Safe‐R, and Safe‐WR, which emulate the WAND algorithm global pruning technique. We evaluated our proposals with different amounts of work, from short to extra‐large queries, using single query processing and batch of queries. Results show that the size‐based strategy reports the highest speedups but at the cost of low quality of results. The range‐based algorithm retrievals the exact top‐k documents and maintains a good speedup. Moreover, both strategies are capable of scaling as the amount of work is increased. In addition, there is no significant difference in the performance of the three threshold sharing policies.
Background: The maximum subsequence problem finds a contiguous subsequence of the largest sum of a sequence of n numbers. Solutions to this problem are used in various branches of science, especially in applications of computational biology. The best sequential solution to the problem has an O(n) running time and uses dynamic programming. Although effective, this solution returns little information and disregards the existence of more than a maximum subsequence sum. Particularly in DNA analysis, if we find all maximum subsequence sums, we will also find all the possible pathogenicity islands, which are stretches with high possibility of causing some diseases. Methods: We present new Bulk Synchronous Parallel/Coarse-Grained Multicomputer (BSP/CGM) parallel algorithms, which consider the existence of more than one subsequence of maximum sum, and are able to find solutions to three problems: the longest maximum subsequence sum, the shortest maximum subsequence sum, and the number of disjoint subsequences of maximum sum. To the best of our knowledge, there are no parallel BSP/CGM algorithms for the related problems. Taking advantage of the advent of general purpose graphics processing unit (GPGPU), we implemented our algorithms on multi-GPU with Compute Unified Device Architecture (CUDA) and, for comparison purposes, MPI and OpenMP implementations have also been developed. Results: The algorithms presented good speedups, as confirmed by experimental results. They use p processors and require O(n/p) parallel time with a constant number of communication rounds for the algorithm of the maximum subsequence sum and O(log p) communication rounds, with O(n/p) local computation per round, for the algorithms of the related problems. Conclusions: We concluded that our algorithms for the maximum subsequence sum and related problems are unique and effective. We also believe that the BSP/CGM model can guide parallel implementations in modern architectures such as GPGPU/CUDA. As future work, we intend to extend these results to arrays with higher dimensions and compute all maximal subsequences in a given interval.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.