Rasmus Resen Amossen scite author profile

2009

Computing an equi-join followed by a duplicate eliminating projection is conventionally done by performing the two operations in serial. If some join attribute is projected away the intermediate result may be much larger than both the input and the output, and the computation could therefore potentially be performed faster by a direct procedure that does not produce such a large intermediate result. We present a new algorithm that has smaller intermediate results on worst-case inputs, and in particular is more efficient in both the RAM and I/O model. It is easy to see that join-project where the join attributes are projected away is equivalent to boolean matrix multiplication. Our results can therefore also be interpreted as improved sparse, output-sensitive matrix multiplication.

A New Data Layout for Set Intersection on GPUs

2011

Abstract-Set intersection is the core in a variety of problems, e.g. frequent itemset mining and sparse boolean matrix multiplication. It is well-known that large speed gains can, for some computational problems, be obtained by using a graphics processing unit (GPU) as a massively parallel computing device. However, GPUs require highly regular control flow and memory access patterns, and for this reason previous GPU methods for intersecting sets have used a simple bitmap representation. This representation requires excessive space on sparse data sets. In this paper we present a novel data layout, BATMAP, that is particularly well suited for parallel processing, and is compact even for sparse data.Frequent itemset mining is one of the most important applications of set intersection. As a case-study on the potential of BATMAPs we focus on frequent pair mining, which is a core special case of frequent itemset mining. The main finding is that our method is able to achieve speedups over both Apriori and FP-growth when the number of distinct items is large, and the density of the problem instance is above 1%. Previous implementations of frequent itemset mining on GPU have not been able to show speedups over the best single-threaded implementations.

Better Size Estimation for Sparse Matrix Products

Amossen¹,

Campagna

2013

Algorithmica

Abstract. We consider the problem of doing fast and reliable estimation of the number of non-zero entries in a sparse boolean matrix product. Let n denote the total number of non-zero entries in the input matrices. We show how to compute a 1 ± ε approximation (with small probability of error) in expected time O(n) for any ε > 4/ 4 √ n. The previously best estimation algorithm, due to Cohen (JCSS 1997), uses time O(n/ε 2 ). We also present a variant using O(sort(n)) I/Os in expectation in the cache-oblivious model. We also describe how sampling can be used to maintain (independent) sketches of matrices that allow estimation to be performed in time o(n) if z is sufficiently large. This gives a simpler alternative to the sketching technique of Ganguly et al. (PODS 2005), and matches a space lower bound shown in that paper.

Better Size Estimation for Sparse Matrix Products

Campagna

2010

Multi-dimensional bin packing problems with guillotine constraints

Computers & Operations Research

Pisinger²

2010

The problem addressed in this paper is the decision problem of determining if a set of multi-dimensional rectangular boxes can be orthogonally packed into a rectangular bin while satisfying the requirement that the packing should be guillotine cuttable. That is, there should exist a series of face parallel straight cuts that can recursively cut the bin into pieces so that each piece contains a box and no box has been intersected by a cut. The unrestricted problem is known to be NP-hard. In this paper we present a generalization of a constructive algorithm for the multi-dimensional bin packing problem, with and without the guillotine constraint, based on constraint programming.