Fast &lt;it&gt;k&lt;/it&gt;-selection algorithms for graphics processing units

Alabi, Tolu; Blanchard, Jeffrey D.; Gordon, Bradley S.; Steinbach, Russel

doi:10.1145/2133803.2345676

Cited by 38 publications

(49 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…2.1, random sampling similar to bootstrapping and the technique used in randomizedSelect [Monroe et al 2011] permits a rapid definition of buckets which will each contain a roughly uniform number of values from the full data set. The fast linear projection into buckets is borrowed from bucketSelect [Alabi et al 2012]. Finally, the guarantees of sort&choose are applied to a reduced vector containing the candidates for the set of desired order statistics.…”

Section: An Algorithm For Selecting Multiple Order Statistics: Bucketmentioning

confidence: 99%

“…While bucketSelect is the fastest algorithm on non adversarial distributions, the algorithm struggles when faced with adversarial vectors [Alabi et al 2012]. In bucketSelect the buckets are defined as equal width intervals from the minimum to maximum value in the vector.…”

Section: An Algorithm For Selecting Multiple Order Statistics: Bucketmentioning

confidence: 99%

“…In 2011, several GPU selection algorithms were announced including an optimization based algorithm cuttingPlane [Beliakov 2011], a randomized but deterministic selection randomizedSelect [Monroe et al 2011], a radix selection radixSelect [Alabi et al 2012], and an algorithm based on distributive partitioning bucketSelect [Alabi et al 2012]. The performance of these four algorithms was extensively compared in [Alabi et al 2012] and all four algorithms are implemented in the software GGKS: Grinnell GPU k-selection [Alabi et al 2011].…”

Section: Introductionmentioning

confidence: 99%

“…The fastest known sorting algorithm for GPUs is Merrill and Grimshaw's radix sort [Merrill and Grimshaw 2011] implemented as thrust::sort in the Thrust library [Hoberock and Bell 2010]. In [Alabi et al 2012], all four GPU selection algorithms were shown to select any order statistic from data sets with more than 2 20 numerical entries in less time than sort&choose. Moreover, bucketSelect and randomizedSelect were observed to have comparable speeds on vectors of floats and are consistently faster than the optimization and radix selection algorithms.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Selecting Multiple Order Statistics with a Graphics Processing Unit

Blanchard

Opavsky

Uysaler

2016

ACM Trans. Parallel Comput.

Self Cite

View full text Add to dashboard Cite

Extracting a set of multiple order statistics from a huge data set provides important information about the distribution of the values in the full set of data. This article introduces an algorithm, bucketMultiSelect, for simultaneously selecting multiple order statistics with a graphics processing unit (GPU). Typically, when a large set of order statistics is desired the vector is sorted. When the sorted version of the vector is not needed, bucketMultiSelect significantly reduces computation time by eliminating a large portion of the unnecessary operations involved in sorting. For large vectors, bucketMultiSelect returns thousands of order statistics in less time than sorting the vector while typically using less memory. For vectors containing 2 28 values of type double, bucketMultiSelect selects the 101 percentile order statistics in less than 200ms and is more than 10× faster than sorting the vector with a GPU optimized radix sort.

show abstract

Section: An Algorithm For Selecting Multiple Order Statistics: Bucketmentioning

confidence: 99%

Section: An Algorithm For Selecting Multiple Order Statistics: Bucketmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Selecting Multiple Order Statistics with a Graphics Processing Unit

Blanchard

Opavsky

Uysaler

2016

ACM Trans. Parallel Comput.

Self Cite

View full text Add to dashboard Cite

show abstract

“…We use a GPU k-selection algorithm to select kNNs with the smallest DTW distance from all unfiltered candidates. The main technique is distributive partitioning for k-selection on the GPU [3]. We adopt the existing work for GPU k selection [3] but with two incremental improvements: (1) we use one block to handle one k-selection for one query to support multiple k-selections; (2) we return all k smallest segments instead of only the k-th one.…”

Section: Knns: Filtering Verification and Selectionmentioning

confidence: 99%

SMiLer

Zhou

Tung

2015

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data

View full text Add to dashboard Cite

It is useful to predict future values in time series data, for example when there are many sensors monitoring environments such as urban space. The Gaussian Process (GP) model is considered as a promising technique for this setting. However, the GP model requires too high a training cost to be tractable for large data. Though approximate methods have been proposed to improve GP's scalability, they usually can only capture global trends in the data and fail to preserve small-scale patterns, resulting in unsatisfactory performance.We propose a new method to apply the GP for sensor time series prediction. Instead of (eagerly) training GPs on entire datasets, we custom-build query-dependent GPs on small fractions of the data for each prediction request.Implementing this idea in practice at scale requires us to overcome two obstacles. On the one hand, a central challenge with such a semi-lazy learning model is the substantial model-building effort at kNN query time, which could lead to unacceptable latency. We propose a novel two-level inverted-like index to support kNN search using the DTW on the GPU, making such "just-in-time" query-dependent model construction feasible for real-time applications.On the other hand, several parameters should be tuned for each time series individually since different sensors have different data generating processes in diverse environments. Manually configuring the parameters is usually not feasible due to the large number of sensors. To address this, we devise an adaptive auto-tuning mechanism to automatically determine and dynamically adjust the parameters for each time series with little human assistance.Our method has the following strong points: (a) it can make prediction in real time without a training phase; (b) it can yield superior prediction accuracy; and (c) it can effectively estimate the analytical predictive uncertainty.To illustrate our points, we present SMiLer, a semi-lazy time series prediction system for sensors. Extensive experiments on real-world datasets demonstrate its effectivePermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. ness and efficiency. In particular, by devising a two-level inverted-like index on the GPU with an enhanced lower bound of the DTW, SMiLer accelerates the efficiency of kNN search by one order of magnitude over its baselines. The prediction accuracy of SMiLer is better than the state-of-the-art competitors (up to 10 competitors) with better estimation of predictive uncertainty.

show abstract

Manycore GPU processing of repeated range queries over streams of moving objects observations

Lettich

Orlando

Silvestri

et al. 2016

Concurrency and Computation

View full text Add to dashboard Cite

The ability to timely process significant amounts of continuously updated spatial data is mandatory for an increasing number of applications. Parallelism enables such applications to face this data-intensive challenge and allows the devised systems to feature low latency and high scalability. In this paper, we focus on a specific data-intensive problem concerning the repeated processing of huge amounts of range queries over massive sets of moving objects, where the spatial extent of queries and objects is continuously modified over time. To tackle this problem and significantly accelerate query processing, we devise a hybrid CPU/GPU pipeline that compresses data output and saves query processing work. The devised system relies on an ad- hoc spatial index leading to a problem decomposition that results in a set of independent data-parallel tasks. The index is based on a point-region quadtree space decomposition and allows to tackle effectively a broad range of spatial object distributions, even those very skewed. Also, to deal with the architectural peculiarities and limitations of the GPUs, we adopt non-trivial GPU data structures that avoid the need of locked memory accesses while favouring coalesced memory accesses, thus enhancing the overall memory throughput. To the best of our knowledge, this is the first work that exploits GPUs to efficiently solve repeated range queries over massive sets of continuously moving objects, possibly characterized by highly skewed spatial distributions. In comparison with state-of-the-art CPU-based implementations, our method highlights significant speedups in the order of 10 20 , depending on the dataset

show abstract

Fast <it>k</it>-selection algorithms for graphics processing units

Cited by 38 publications

References 12 publications

Selecting Multiple Order Statistics with a Graphics Processing Unit

Selecting Multiple Order Statistics with a Graphics Processing Unit

SMiLer

Manycore GPU processing of repeated range queries over streams of moving objects observations

Contact Info

Product

Resources

About