I would like to fine-tune a pre-trained model. This is the model:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
This is the data (I know it is not clinical but let’s roll with it for now):
from fastai.datasets import untar_data, URLs
path = untar_data(URLs.IMDB_SAMPLE)
df = pd.read_csv(path/'texts.csv')
df.head()
How can I fine-tune the above model with this data? I know the answer is here but I cannot figure it out.
I would then like to take the embeddings. I tried model.last_hidden_state
(as I have seen outputs.last_hidden_state) but it does not work either.