“…There has also been a burgeoning line of recent work that relaxes these assumptions and allows for heavy-tailed underlying distributions [3,28,35], but the resulting algorithms are often not only complex, but are also specifically batch learning algorithms that require the entire dataset, which limits their scalability. For instance, many popular polynomial time algorithms on heavy-tailed mean estimation [9,43,5,44,6,32] and heavy-tailed linear regression [28,46,42,38] need to store the entire dataset. On the other hand, most successful practical modern learning algorithms are iterative, light-weight and access data in a "streaming" fashion.…”