Towards Making Deep Transfer Learning Never Hurt

Wan, Ruosi; Xiong, Haoyi; Li, Xingjian; Zhu, Zhanxing; Huan, Jun

doi:10.1109/icdm.2019.00068

Cited by 17 publications

(22 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, all the models were trained with eight years of data and are thus still capable of producing accurate predictions. The dataset size affects deep learning performance, as a small amount of training data may degrade the performance [51]. We recreated the DL model for the Malacca Strait (the area with the most extensive dataset size) for each variation in the training set to show the effect of the data size on the model performance (see Figure 11).…”

Section: Discussion: Influence Of Data Sizementioning

confidence: 99%

Long-Term Ship Position Prediction Using Automatic Identification System (AIS) Data and End-to-End Deep Learning

Ibadurrahman

Hamada

Wada

et al. 2021

Sensors

View full text Add to dashboard Cite

The establishment of maritime safety and security is an important concern. Ship position prediction for maritime situational awareness (MSA), as a critical aspect of maritime safety and security, requires a longer time interval than collision avoidance and maritime traffic monitoring. However, previous studies focused mainly on shorter time-interval predictions ranging from 30 min to 10 h. A longer time-interval ship position prediction is required not only for MSA, but also for efficient allocation of ships by shipping companies in accordance with global freight demand. This study used an end-to-end tracking method that inputs the previous position of a vessel to a trained deep learning model to predict its next position with an average 24-h interval. An AIS dataset with a long-time-interval distribution in a nine-year timespan for capesize bulk carriers worldwide was used. In the first experiment, a deep learning model of the Indian Ocean was examined. Subsequently, the model performance was compared for six different oceans and six primary maritime chokepoints to investigate the influence of each area. In the third experiment, a sample location within the Malacca Strait area was selected, and the number of ships was counted daily. The results indicate that the ship position can be predicted accurately with an average time interval of 24 h using deep learning systems with AIS data.

show abstract

Section: Discussion: Influence Of Data Sizementioning

confidence: 99%

Long-Term Ship Position Prediction Using Automatic Identification System (AIS) Data and End-to-End Deep Learning

Ibadurrahman

Hamada

Wada

et al. 2021

Sensors

View full text Add to dashboard Cite

show abstract

“…4. L2SP regularization, where the weights are decayed towards their pre-trained values rather than 0 during fine-tuning, improves performance when the source and target dataset are closely related, but hinders it when they are less related [21,20,37] 5. Momentum should be lower for more closely related source and target datasets [18].…”

Section: Related Workmentioning

confidence: 99%

“…They showed that a high level of regularization decaying towards the pre-trained weights is beneficial on these datasets. It has since been shown that the L2SP regularizer can result in minimal improvement or even worse performance when the source and target datasets are less related [18,37].…”

Section: L2spmentioning

confidence: 99%

“…However, there are many real world scenarios where the large amounts of training data required to obtain the best performance cannot be met or are prohibitively expensive. Transfer learning has been shown to improve performance in a wide variety of computer vision tasks, particularly when the source and target tasks are closely related and the target task is small [28,22,6,8,33,27,21,20,37]. It has become standard practice to pre-train on Imagenet 1K for many different tasks where the available labeled datasets are orders of magnitude smaller than Imagenet 1K [21,20,37,24,19,27,25,26,9].…”

Section: Introductionmentioning

confidence: 99%

“…Transfer learning has been shown to improve performance in a wide variety of computer vision tasks, particularly when the source and target tasks are closely related and the target task is small [28,22,6,8,33,27,21,20,37]. It has become standard practice to pre-train on Imagenet 1K for many different tasks where the available labeled datasets are orders of magnitude smaller than Imagenet 1K [21,20,37,24,19,27,25,26,9]. Several papers published in recent years have questioned this established paradigm.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Non-binary deep transfer learning for image classification

Plested

Shen

Gedeon

2021

Preprint

View full text Add to dashboard Cite

The current standard for a variety of computer vision tasks using smaller numbers of labelled training examples is to fine-tune from weights pre-trained on a large image classification dataset such as ImageNet. The application of transfer learning and transfer learning methods tends to be rigidly binary. A model is either pre-trained or not pre-trained. Pre-training a model either increases performance or decreases it, the latter being defined as negative transfer. Application of L2-SP regularisation that decays the weights towards their pre-trained values is either applied or all weights are decayed towards 0. This paper re-examines these assumptions. Our recommendations are based on extensive empirical evaluation that demonstrate the application of a non-binary approach to achieve optimal results. (1) Achieving best performance on each individual dataset requires careful adjustment of various transfer learning hyperparameters not usually considered, including number of layers to transfer, different learning rates for different layers and different combinations of L2SP and L2 regularization. (2) Best practice can be achieved using a number of measures of how well the pre-trained weights fit the target dataset to guide optimal hyperparameters. We present methods for non-binary transfer learning including combining L2SP and L2 regularization and performing non-traditional fine-tuning hyperparameter searches. Finally we suggest heuristics for determining the optimal transfer learning hyperparameters. The benefits of using a non-binary approach are supported by final results that come close to or exceed state of the art performance on a variety of tasks that have traditionally been more difficult for transfer learning.Preprint. Under review.

show abstract