Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1151
|View full text |Cite
|
Sign up to set email alerts
|

Soft Representation Learning for Sparse Transfer

Abstract: Transfer learning is effective for improving the performance of tasks that are related, and Multi-task learning (MTL) and Cross-lingual learning (CLL) are important instances. This paper argues that hard-parameter sharing, of hard-coding layers shared across different tasks or languages, cannot generalize well, when sharing with a loosely related task. Such case, which we call sparse transfer, might actually hurt performance, a phenomenon known as negative transfer. Our contribution is using adversarial traini… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 20 publications
0
2
0
Order By: Relevance
“…We are the first to apply this idea and revise it for representation learning in GZSL. A novel multi-task representation learning paradigm is proposed that models task-specific and task-shared representations in parallel, unlike existing paradigms [37,38] that use a single MoE for each sub-task and a hierarchical structure. For the sake of clear understanding, we highlight the distinctions between our approach and those counterparts in Table 1.…”
Section: Related Workmentioning
confidence: 99%
“…We are the first to apply this idea and revise it for representation learning in GZSL. A novel multi-task representation learning paradigm is proposed that models task-specific and task-shared representations in parallel, unlike existing paradigms [37,38] that use a single MoE for each sub-task and a hierarchical structure. For the sake of clear understanding, we highlight the distinctions between our approach and those counterparts in Table 1.…”
Section: Related Workmentioning
confidence: 99%
“…However, they end up highlighting the data sources with more consensus knowledge, which still cannot rigorously define and constrain the accuracy drop for each source in their learning objectives. The second family learns separate models for shared information and specific information, respectively [10,26,35,36,43]. However, the negative transfer may still happen in the models which learn shared information, and is not well defined or formulated in these works.…”
Section: Related Workmentioning
confidence: 99%