How to do sequence fine tuning?

tueboesen · July 21, 2020, 6:59pm

I have been training a BERT model on a large unsupervised dataset, and now I wish to fine tune the model on a small labelled dataset, but I can’t quite grasp how to do this conceptually, and I’m hoping some of you can help me out.

When doing the unsupervised training/self-training, everything seems fine, and I think I understand it.
In this case, my network is a standard BERT, with a linear layer on top that takes the standard 768 hidden features in BERT down to 30, which is my vocab_size. (I’m training on gene sequences, so basically one sample in my dataset, looks like this:

ASDGDFASGDFSGSDASFASDAUYRES

where I do the standard thing of masking out some of the letters and trying to predict them.
So for standard training, my setup looks like this:

    predicted_sequence=bert(input_sequence,masked_input_sequence)
    loss = crossentropy(predicted_sequence,masked_input_sequence)

However when I now want to switch to fine-tuning I’m not really sure what to do. In this case my dataset now both consist of a gene sequence and a label sequence:

DFASDGFTHGFDDFSDASFDASF , 00000001111111100000000022222

How do I change my network such that I can now fine-train it to predict these new labels?, do I still use bert(input,masked_input_sequence)? Do I remove the linear layer on top of BERT? or what is the conceptual idea here?

swayson · July 21, 2020, 7:26pm

I found this explanation by one of the original authors great to get a conceptual understanding.

marton-avrios · July 21, 2020, 7:28pm

If I understand correctly you need token classification. Example here: https://github.com/huggingface/transformers/tree/master/examples/token-classification

You can assign a label to each token you have. I think you will need to consider each “letter” a token.

tueboesen · July 21, 2020, 9:14pm

Thank you for the link, that was the kind of conceptual talk I was looking for.

tueboesen · July 21, 2020, 9:15pm

The examples there are a bit over my head for now, but I got some ideas for how to proceed now, and if I later get stuck again I might try to look at them more closely to see how they do it.

swayson · July 22, 2020, 6:12am

Found this talk from Alex Graves also to be complementary, he also speaks about transformers more generally with advantages/disadvantages.

The Q/A at the end of these videos I found to be helpful.

Topic		Replies	Views
Fine tuning an unsupervised model - BERT Beginners	0	856	April 7, 2022
How to do unsupervised fine-tuning? 🤗Transformers	1	6942	January 29, 2021
Fine-tuning BERT with deterministic masking instead of random masking Beginners	0	161	April 22, 2024
Fine Tune BERT Models Beginners	5	16566	June 25, 2021
Custom Tasks and BERT Fine Tuning Beginners	4	4999	October 30, 2020

How to do sequence fine tuning?

Related topics