Does fine-tuning mean retraining the entire model?

Would fine-tuning need to retrain the entire model?

What does it do exactly?


I found a couple of things:
google dot com/search?q=Does+fine-tuning+mean+retraining+the+entire+model

“No, you don’t need to retrain the entire model. Fine-tuning refers to taking the weights trained in the general model and then continuing training a bit using your specific data. Using this approach, typically the only things you need to fully train are the models performing the downstream task from the model creating the representation of the data, often just a handful of densely connected layers to perform e.g. classification, which are orders of magnitude less expensive to train than the representation model.”

" The most common incarnation of transfer learning in the context of deep learning is the following workflow:

  1. Take layers from a previously trained model.
  2. Freeze them, so as to avoid destroying any of the information they contain during future training rounds.
  3. Add some new, trainable layers on top of the frozen layers. They will learn to turn the old features into predictions on a new dataset.
  4. Train the new layers on your dataset.

A last, optional step, is fine-tuning, which consists of unfreezing the entire model you obtained above (or part of it), and re-training it on the new data with a very low learning rate."

towardsdatascience dot com/how-to-fine-tune-your-neural-network-for-your-data-image-classification-d0f01c92300b

"We can fine-tune our transfer model to our unique data by slowly unfreezing the gradients of the convolutional layers as we train. First, we start by loading our model and changing the last layer exactly the same way we did with our first transfer model. Then, after each epoch of training, we can begin updating the weights of the next convolutional layer of our network with some code that looks something like this.

We slowly update the gradients starting with the lowest level layers and working our way to the top. This keeps a lot of the information our model has from the pre-trained weights, while helping fine-tune it to our book covers. After training and testing, we can compare it to our other two networks."

However, for very large models it still has to be able to fit in memory, it’s just that it doesnt take as long.

You may want Few-shot Learning for very large models.

1 Like

Vanilla Fine-tuning BERT updates every parameters. You can freeze some layers if you want. There are some approaches using partial updates like adapters, See the link