How to fine-tune a pre-trained model and then get the embeddings?

I would like to fine-tune a pre-trained model. This is the model:

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")

This is the data (I know it is not clinical but let’s roll with it for now):

from fastai.datasets import untar_data, URLs
path = untar_data(URLs.IMDB_SAMPLE)
df = pd.read_csv(path/'texts.csv')
df.head()

How can I fine-tune the above model with this data? I know the answer is here but I cannot figure it out.

I would then like to take the embeddings. I tried model.last_hidden_state (as I have seen outputs.last_hidden_state) but it does not work either.

Please, before asking questions look on the internet for a minute or two. This is a VERY common use case, as you may have expected. It takes us too much time to keep repeating all the same questions. Thanks.

The first hit that I got on Google already gives you a tutorial on fine-tuning: Fine-tuning a pretrained model — transformers 4.10.1 documentation

Second: Fine-tuning with custom datasets — transformers 4.10.1 documentation

Notebooks: 🤗 Transformers Notebooks — transformers 4.10.1 documentation

Of course, you cannot get the last hidden states as an attribute of the model. You first need to do a forward pass with some given data. From the output of the data you can then extract the last hidden state.

2 Likes