Custom Tasks and BERT Fine Tuning

I am using the transformers library to get embeddings for sentences and tokens. More specifically I use the first token embedding [CLS] for the embedding that represents the sentence and I compare sentences using cosine similarity. This approach is naive and completely unsupervised.

Now I would like to gain some experience in fine tuning the model: For example how to fine tune BERT for NER and how to use BERT for sentence pairs. How can I study how to fine tune BERT?

nice tutorial on tuning using transformers with pytorch by Chris McCormick here https://mccormickml.com/2019/07/22/BERT-fine-tuning/#4-train-our-classification-model. See also his youtube videos and other posts.

If you want to make a custom model, try this by abhimishra https://github.com/abhimishra91/transformers-tutorials/blob/master/transformers_multi_label_classification.ipynb

1 Like

what is the difference between fine tune and custom model ?

The standard BERT-base has 12 layers each of 12 heads, using 768 dimensions for the vector encoding. Those values cannot be changed after the model has been created and pre-trained. When you fine-tune a BERT-base model to solve your own task, you change the values of the weight parameters within the model, but you don’t change the number or layers or heads, and you don’t change the 768 dimensions.
If you create and pre-train your own BERT model, then you can choose any number of layers or heads or dimensions, and you can choose how many tokens your BERT has in its vocabulary. I believe we would call this a custom model.

By the way, there are several different sizes of BERT model available, pre-trained. For example, see this link for BERT-small, BERT-tiny etc, which were created by the original BERT team Devlin et al

You might also be interested in BERT-like models that have different sizes and have also been pre-trained in different ways, such as DistilBERT, ALBERT and RoBERTa

Hello again, I’ve just re-read what I wrote above, and realised the second sentence is not quite right. The values can be changed after pre-training. It is called pruning. However, those values are not altered during normal fine-tuning.