Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data 2014
DOI: 10.1145/2588555.2593678
|View full text |Cite
|
Sign up to set email alerts
|

Materialization optimizations for feature selection workloads

Abstract: There is an arms race in the data management industry to support analytics, in which one critical step is feature selection, the process of selecting a feature set that will be used to build a statistical model. Analytics is one of the biggest topics in data management, and feature selection is widely regarded as the most critical step of analytics; thus, we argue that managing the feature selection process is a pressing data management challenge. We study this challenge by describing a feature-selection langu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 69 publications
(6 citation statements)
references
References 22 publications
0
6
0
Order By: Relevance
“…We do so by formalizing partial CNN inference operations as first-class citizens for query processing. In doing so, our work expands a recent line of work on materialization optimizations for feature selection in linear models [168,291] and integrating ML with relational joins [174,91,248,175]. Finally, our work also expands the work in database systems on optimizing memory usage based on data access patterns of queries [98].…”
Section: Experimental Evaluationmentioning
confidence: 76%
“…We do so by formalizing partial CNN inference operations as first-class citizens for query processing. In doing so, our work expands a recent line of work on materialization optimizations for feature selection in linear models [168,291] and integrating ML with relational joins [174,91,248,175]. Finally, our work also expands the work in database systems on optimizing memory usage based on data access patterns of queries [98].…”
Section: Experimental Evaluationmentioning
confidence: 76%
“…We note however that, in practice, it is not very common to encounter such dense data sets with large number of columns [35].…”
Section: Fused Kernel For Dense Matricesmentioning
confidence: 97%
“…For a device with 48KB shared memory per SM , the limit on n is close to 6K. Such matrices with limited number of columns are indeed common in many enterprise workloads [35].…”
Section: Fused Kernel For Sparse Matricesmentioning
confidence: 99%
“…Feature engineering is a critical and resource-consuming task in the development of machine-learning solutions in general, and classifiers in particular [16,18,34]. In the framework proposed by Kimelfeld and Ré [22], the general goal is to utilize the database's knowledge of the raw data structure to provide automated assistance in feature engineering.…”
Section: Introductionmentioning
confidence: 99%