What is Transfer Learning in Deep Learning?

Blog > Deep Learning

What is Transfer Learning in Deep Learning?

Transfer learning is a deep learning technique to reuse existing pre-trained models for new and similar tasks. As we already know, deep learning models are data-hungry. Also training huge models with large datasets are computationally expensive. Therefore, there are two essential problems:

· Lack of data

· Lack of computational resources


Transfer learning aims to handle these problems. There are some strategies in the transfer learning approach:


1) Directly Use the Pre-Trained Model

In this strategy, the pre-trained model is able to solve the target problem without further training process if and only if the same classes exist in the target data.

· Example: State-of-the-art object detection models can be applied directly to the target problem as these models are trained on large-scale datasets. Since these datasets contain hundreds of unique classes, models will be able to predict the target classes in the new task.



2) Treating Pretrained Models as a Feature Extractor

Instead of using the pre-trained model directly as in the previous example, we can treat that model as a feature extractor by discarding the fully connected layers. That is a strategy mostly used on convolutional neural networks. Since convolutions are just sliding filters, they do not care about image size (image channels must match), they can behave as feature extractors. This strategy is widely employed when the dataset is reasonably small.

· Example 1: We can take a pre-trained model which is trained on the ImageNet dataset and flatten the result of the convolutional layers. This will allow us to add output layer(s) to the pre-trained network while maintaining the features for the target image/dataset.

· Example 2: Training an embedding layer from scratch is sometimes hard. Pretrained embeddings such as GloVe can serve as a feature extractor.



3) Fine Tuning Last Layers of Pretrained Model

We can also fine-tune the existing model’s layers. In this strategy, some of the layers of the pre-trained network are frozen and not updated during training. That is, they are not included in the backpropagation process.

· Example 1: Suppose the network is trained on a car's dataset. And if the new objective is something about cars, fine-tuning very last layers is feasible. The new objective can be about trucks or something that has tires so that the two datasets have common features. As the first convolutional layers will learn low-level features, fine-tuning the last layers is adequate for most of the cases.



Source: https://dergipark.org.tr/en/pub/dubited/article/878779


4) Using a Pre-trained Model as a Starting Point

The last strategy we want to mention is using the pre-trained model’s weights as a starting point rather than starting with randomly initialized weights. If the tasks are similar and there are a considerably large amount of data, this model can be beneficial. However, this one will be computationally intensive as there will be no frozen layers and all of them will be updated with the new data points during training. This approach is better suited for cases where there are a vast amount of data.

· Example 1: One can take a pre-trained ResNet-50 CNN and use its weights as a starting point. This should allow faster and better convergence since convolutional layers will be in a better state rather than knowing nothing. An example implementation can be found here.