Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing 2017
DOI: 10.18653/v1/d17-1038
|View full text |Cite
|
Sign up to set email alerts
|

Learning to select data for transfer learning with Bayesian Optimization

Abstract: Domain similarity measures can be used to gauge adaptability and select suitable data for transfer learning, but existing approaches define ad hoc measures that are deemed suitable for respective tasks. Inspired by work on curriculum learning, we propose to learn data selection measures using Bayesian Optimization and evaluate them across models, domains and tasks. Our learned measures outperform existing domain similarity measures significantly on three tasks: sentiment analysis, partof-speech tagging, and pa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

5
140
0
6

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 146 publications
(151 citation statements)
references
References 29 publications
5
140
0
6
Order By: Relevance
“…Ammar et al (2016) and Although not for cross-lingual transfer, there has been prior work on data selection for training models. Tsvetkov et al (2016a) and Ruder and Plank (2017) use Bayesian optimization for data selection. van der Wees et al (2017) study the effect of data selection of neural machine translation, as well as propose a dynamic method to select relevant training data that improves translation performance.…”
Section: Related Workmentioning
confidence: 99%
“…Ammar et al (2016) and Although not for cross-lingual transfer, there has been prior work on data selection for training models. Tsvetkov et al (2016a) and Ruder and Plank (2017) use Bayesian optimization for data selection. van der Wees et al (2017) study the effect of data selection of neural machine translation, as well as propose a dynamic method to select relevant training data that improves translation performance.…”
Section: Related Workmentioning
confidence: 99%
“…However, we focus on supervised sequence labeling domain adaptation, where huge improvement can be achieved by utilizing only small-scale annotated data from the target domain. Previous works in domain adaptation often try to find a subset of source domain data to align with the target domain data (Chopra et al, 2013;Ruder and Plank, 2017) which realizes a kind of source data sample or construct a common feature space, while those methods may wash out informative characteristics of target domain samples. Instance-based domain adaptation (Jiang and Zhai, 2007;Zhang and Xiong, 2018) implement the source sample weighting by assigning higher weights to source domain samples which are more similar to the target domain.…”
Section: Related Workmentioning
confidence: 99%
“…The first category makes the assumption that labeled data from both source and target domains are available to us, though the amount may differ [6,43]. While the second category assumes that no labeled data from the target domain is available in addition to the labeled source domain data [27,29]. Our work falls into the first category.…”
Section: Related Workmentioning
confidence: 99%
“…In this case, there are also two categories. The first is the instance based methods, which select or reweight the source domain training samples so that data from the source domain and the target domain would share a similar data distribution [4,11,27]. The second category is feature based methods, which aim to locate a common feature space that can reduce the differences between the source and target domains.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation