Materialization optimizations for feature selection workloads

Zhang, Ce; Kumar, Arun; Ré, Christopher

doi:10.1145/2588555.2593678

Cited by 69 publications

(6 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We do so by formalizing partial CNN inference operations as first-class citizens for query processing. In doing so, our work expands a recent line of work on materialization optimizations for feature selection in linear models [168,291] and integrating ML with relational joins [174,91,248,175]. Finally, our work also expands the work in database systems on optimizing memory usage based on data access patterns of queries [98].…”

Section: Experimental Evaluationmentioning

confidence: 76%

Query Optimizations for Deep Learning Systems

Nakandala

2023

Companion of the 2023 International Conference on Management of Data

View full text Add to dashboard Cite

show abstract

Section: Experimental Evaluationmentioning

confidence: 76%

Query Optimizations for Deep Learning Systems

Nakandala

2023

Companion of the 2023 International Conference on Management of Data

View full text Add to dashboard Cite

show abstract

“…We note however that, in practice, it is not very common to encounter such dense data sets with large number of columns [35].…”

Section: Fused Kernel For Dense Matricesmentioning

confidence: 97%

“…For a device with 48KB shared memory per SM , the limit on n is close to 6K. Such matrices with limited number of columns are indeed common in many enterprise workloads [35].…”

Section: Fused Kernel For Sparse Matricesmentioning

confidence: 99%

On optimizing machine learning workloads via kernel fusion

AshariArash¹,

TatikondaShirish²,

BoehmMatthias³

et al. 2015

SIGPLAN Not.

View full text Add to dashboard Cite

Exploitation of parallel architectures has become critical to scalable machine learning (ML). Since a wide range of ML algorithms employ linear algebraic operators, GPUs with BLAS libraries are a natural choice for such an exploitation. Two approaches are commonly pursued: (i) developing specific GPU accelerated implementations of complete ML algorithms; and (ii) developing GPU kernels for primitive linear algebraic operators like matrix-vector multiplication, which are then used in developing ML algorithms. This paper extends the latter approach by developing fused kernels for a combination of primitive operators that are commonly found in popular ML algorithms. We identify the generic pattern of computation (α * X T × (v ⊙ (X × y)) + β * z) and its various instantiations. We develop a fused kernel to optimize this computation on GPUs -with specialized techniques to handle both sparse and dense matrices. This approach not only reduces the cost of data loads due to improved temporal locality but also enables other optimizations like coarsening and hierarchical aggregation of partial results. We also present an analytical model that considers input data characteristics and available GPU resources to estimate nearoptimal settings for kernel launch parameters. The proposed approach provides speedups ranging from 2× to 67× for different instances of the generic pattern compared to launching multiple operator-level kernels using GPU accelerated libraries. We conclude by demonstrating the effectiveness of the approach in improving end-to-end performance on an entire ML algorithm.

show abstract

“…Feature engineering is a critical and resource-consuming task in the development of machine-learning solutions in general, and classifiers in particular [16,18,34]. In the framework proposed by Kimelfeld and Ré [22], the general goal is to utilize the database's knowledge of the raw data structure to provide automated assistance in feature engineering.…”

Section: Introductionmentioning

confidence: 99%