On the Sample Complexity of Adversarial Multi-Source PAC Learning

Konstantinov, Nikola; Frantar, Elias; Alistarh, Dan; Lampert, Christoph H.

doi:10.48550/arxiv.2002.10384

Cited by 1 publication

(1 citation statement)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the multisource case N 1, to the best of our knowledge, situations of negative transfer have only been described in adversarial settings with corrupted labels. For instance, the recent papers of [Qia18,MMM19,KFAL20] show limits of multitask under various adversarial corruption of labels in datasets, while [SZ19] derives a positive result, i.e., rates (for Lipschitz loss) decreasing in both N and n, up to excluded or downweighted datasets. The procedure of [SZ19] is however nonadaptive as it requires known noise proportions.…”

Section: Background and Related Workmentioning

confidence: 99%

A No-Free-Lunch Theorem for MultiTask Learning

Hanneke¹,

Kpotufe²

2020

Preprint

View full text Add to dashboard Cite

Multitask learning and related areas such as multi-source domain adaptation address modern settings where datasets from N related distributions {P t } are to be combined towards improving performance on any single such distribution D. A perplexing fact remains in the evolving theory on the subject: while we would hope for performance bounds that account for the contribution from multiple tasks, the vast majority of analyses result in bounds that improve at best in the number n of samples per task, but most often do not improve in N . As such, it might seem at first that the distributional settings or aggregation procedures considered in such analyses might be somehow unfavorable; however, as we show, the picture happens to be more nuanced, with interestingly hard regimes that might appear otherwise favorable. In particular, we consider a seemingly favorable classification scenario where all tasks P t share a common optimal classifier h * , and which can be shown to admit a broad range of regimes with improved oracle rates in terms of N and n. Some of our main results are as follows:• We show that, even though such regimes admit minimax rates accounting for both n and N , no adaptive algorithm exists; that is, without access to distributional information, no algorithm can guarantee rates that improve with large N for n fixed. • With a bit of additional information, namely, a ranking of tasks {P t } according to their distance to a target D, a simple rank-based procedure can achieve near optimal aggregations of tasks' datasets, despite a search space exponential in N . Interestingly, the optimal aggregation might exclude certain tasks, even though they all share the same h * .

show abstract