Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems 2018
DOI: 10.1145/3196959.3196960
|View full text |Cite
|
Sign up to set email alerts
|

In-Database Learning with Sparse Tensors

Abstract: In-database analytics is of great practical importance as it avoids the costly repeated loop data scientists have to deal with on a daily basis: select features, export the data, convert data format, train models using an external tool, reimport the parameters. It is also a fertile ground of theoretically fundamental and challenging problems at the intersection of relational and statistical data models.This paper introduces a unified framework for training and evaluating a class of statistical learning models … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
96
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
5
2

Relationship

3
4

Authors

Journals

citations
Cited by 66 publications
(97 citation statements)
references
References 38 publications
1
96
0
Order By: Relevance
“…In-database machine learning algorithms is a growing class of algorithms that aims to learn in time sublinear in the input data a.k.a. the design matrix [22,2,11,3,18,19]. The trick is that the design matrix J often happens to be the output of some database query Q whose size could be much larger than the size of its input tables T 1 , .…”
Section: Related Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…In-database machine learning algorithms is a growing class of algorithms that aims to learn in time sublinear in the input data a.k.a. the design matrix [22,2,11,3,18,19]. The trick is that the design matrix J often happens to be the output of some database query Q whose size could be much larger than the size of its input tables T 1 , .…”
Section: Related Resultsmentioning
confidence: 99%
“…By pushing machine learning algorithms down the database engine, we could run some of them in time max j |T j | ≪ |J|, hence sublinear in |J|. This however often requires the database engine to be capable of efficiently solving a large number of aggregate queries [3,2], many of which can be modeled as FAQs [5] or FAQ-AIs [1]. FAQ-AIs studied in this paper have been used as the building blocks of many in-database algorithms including k-means clustering, support vector machines, and polynomial regression [1,3].…”
Section: Related Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…A similar generalization works for factorization machines [5,40]. Categorical a ributes can be accommodated as for linear regression and then each categorical a ribute X j with exponent a j > 0 becomes a group-by a ribute.…”
Section: Applicationsmentioning
confidence: 99%
“…Instead of representing such a Cartesian product of two relation parts explicitly as done by relational database systems, we can represent it symbolically as a tree whose root is the Cartesian product symbol and has as children the two relation parts. It has been shown that factorization can improve the performance of joins [42], aggregates [9,6], and more recently machine learning [51,41,4,2]. The additive inverse of rings allows to treat uniformly data updates (inserts and deletes) and enables incremental maintenance of models learned over relational data [28,39,27].…”
Section: Structure-aware Learningmentioning
confidence: 99%