Proceedings of the 2018 International Conference on Management of Data 2018
DOI: 10.1145/3183713.3196930
|View full text |Cite
|
Sign up to set email alerts
|

Sketching Linear Classifiers over Data Streams

Abstract: We introduce a new sub-linear space sketch-the Weight-Median Sketch-for learning compressed linear classifiers over data streams while supporting the efficient recovery of large-magnitude weights in the model. This enables memory-limited execution of several statistical analyses over streams, including online feature selection, streaming data explanation, relative deltoid detection, and streaming estimation of pointwise mutual information. Unlike related sketches that capture the most frequently-occurring feat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 40 publications
(24 citation statements)
references
References 67 publications
0
24
0
Order By: Relevance
“…We first compute the optimal detection threshold R in (6) and the corresponding failure probability pZC(R) under both scenarios, shown in Figure 1 and Figure 2 respectively. As baseline, the threshold for median filtering is (t + 1)/2 as discussed and its failure probability is p med in (5).…”
Section: Experimental Validationmentioning
confidence: 99%
See 1 more Smart Citation
“…We first compute the optimal detection threshold R in (6) and the corresponding failure probability pZC(R) under both scenarios, shown in Figure 1 and Figure 2 respectively. As baseline, the threshold for median filtering is (t + 1)/2 as discussed and its failure probability is p med in (5).…”
Section: Experimental Validationmentioning
confidence: 99%
“…Count sketch [1], which satisfies both constraints, has been widely applied for heavy components recovery in a variety of applications such as distributed learning [3] and feature selection [4], among others [5][6][7]. Despite extensive implementations in recent years, the count sketch algorithm has rarely been examined or questioned.…”
Section: Introductionmentioning
confidence: 99%
“…Since the above three kinds of sketches are most widely used, we mainly study how to use Cluster-Reduce to compress them (see §4 for details). There are many other sketches, such as ADA-SKETCH [17], HeavyGuardian [6], WavingSketch [5], MaxLogHash [18], and others [11,14,[19][20][21][22][23], that focus on estimating item frequency, finding frequent items, measuring network traffic, estimating set similarity, and other data mining tasks.…”
Section: Sketches In Distributed Data Streamsmentioning
confidence: 99%
“…Another line of work that we draw from applies sketching techniques to learning tasks where the model itself cannot fit in memory [Aghazadeh et al, 2018, Tai et al, 2018. In our setting, we can afford to keep a dense version of the model in memory, and we only make use of the memory-saving properties of sketches to reduce communication between nodes participating in distributed learning.…”
Section: Related Workmentioning
confidence: 99%