Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data 2015
DOI: 10.1145/2723372.2723713
|View full text |Cite
|
Sign up to set email alerts
|

Learning Generalized Linear Models Over Normalized Data

Abstract: Enterprise data analytics is a booming area in the data management industry. Many companies are racing to develop toolkits that closely integrate statistical and machine learning techniques with data management systems. Almost all such toolkits assume that the input to a learning algorithm is a single table. However, most relational datasets are not stored as single tables due to normalization. Thus, analysts often perform key-foreign key joins before learning on the join output. This strategy of learning afte… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
118
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
3
3
3

Relationship

0
9

Authors

Journals

citations
Cited by 126 publications
(118 citation statements)
references
References 25 publications
0
118
0
Order By: Relevance
“…In-database machine learning algorithms is a growing class of algorithms that aims to learn in time sublinear in the input data a.k.a. the design matrix [22,2,11,3,18,19]. The trick is that the design matrix J often happens to be the output of some database query Q whose size could be much larger than the size of its input tables T 1 , .…”
Section: Related Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…In-database machine learning algorithms is a growing class of algorithms that aims to learn in time sublinear in the input data a.k.a. the design matrix [22,2,11,3,18,19]. The trick is that the design matrix J often happens to be the output of some database query Q whose size could be much larger than the size of its input tables T 1 , .…”
Section: Related Resultsmentioning
confidence: 99%
“…Example: Let ǫ = 1 and A be an array having the weights (10,12,13,14,15,16,17,18,19), then the ǫ-sketch of A, denoted A ′ , will be an array that has the weights of indices (1,2,4,8) in A (the weights of index 16 or higher are assumed to be ∞); thus, A ′ = {10, 12, 14, 14, 18, 18, 18, 18}.…”
Section: Approximate Inequality Row Countingmentioning
confidence: 99%
“…Further examples in this category are: Orion [30] and Hamlet [31], which support generalized linear models and Naïve Bayes classi cation; recent e orts on scaling linear algebra using existing distributed database systems [32]; the declarative language BUDS [20], whose compiler can perform deep optimizations of the user's program; and Morpheus [14]. Morpheus factorizes the computation of linear algebra operators summation, matrix-multiplication, pseudo-inverse, and element-wise operations over training datasets de ned by key-foreign key star or chain joins.…”
Section: Related Workmentioning
confidence: 99%
“…The database community has identified various opportunities for optimizing DPR. Several approaches identify as a key bottleneck in DPR and optimize it [37,15,49,38]. Kumar et al [37] optimizes generalized linear models directly over factorized / normalized representations of relational data, avoiding key-foreign key joins.…”
Section: Related Workmentioning
confidence: 99%