2016
DOI: 10.1145/2877204
|View full text |Cite
|
Sign up to set email alerts
|

Materialization Optimizations for Feature Selection Workloads

Abstract: There is an arms race in the data management industry to support analytics, in which one critical step is feature selection, the process of selecting a feature set that will be used to build a statistical model. Analytics is one of the biggest topics in data management, and feature selection is widely regarded as the most critical step of analytics; thus, we argue that managing the feature selection process is a pressing data management challenge. We study this challenge by describing a feature-selection langu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
87
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 101 publications
(87 citation statements)
references
References 28 publications
0
87
0
Order By: Relevance
“…Application in a System: Recent systems such as Columbus [20,33] and MLBase [21] provide a high-level language that includes both relational and ML operations. Such systems optimize the execution of logical ML computations by choosing among alternative physical plans using cost models.…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…Application in a System: Recent systems such as Columbus [20,33] and MLBase [21] provide a high-level language that includes both relational and ML operations. Such systems optimize the execution of logical ML computations by choosing among alternative physical plans using cost models.…”
Section: Resultsmentioning
confidence: 99%
“…There is increasing research and industrial interest in building systems that achieve closer integration of ML with data processing. These include systems that combine linear algebra-based languages with data management platforms [4,15,34], systems for Bayesian inference [9], systems for graph-based ML [23], and systems that combine dataflow-based languages for ML with data management platforms [21,22,33]. None of these systems address the problem of learning over joins, but we think our work is easily applicable to the last group of systems.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…There are different types of DML such as: Tasks : ( for further clarification please refer to MLbase [18,21], (fixed task) Columbus [25], DeepDive [20])…”
Section: A Distributed Machine Learning and Data Mining Techniquesmentioning
confidence: 99%