Hey, I wanted to fine tune a mt5-base model for my project (machine translation) and when I try to freeze the parameters except the language head, I’ll get errors: can any one helps me know why this is the case
(also I read in the Docs that fine tuning transformers often yield better results when updating every parameter and not freezing, but I don’t have enough processing power for that, and is it even the case?)
this is how I freeze the parameters:
model = T5ForConditionalGeneration.from_pretrained("t5-small")
tokenizer = T5TokenizerFast.from_pretrained("t5-small")
for param in model.parameters():
param.requires_grad = False
model.lm_head.requires_grad = True
training_args = TrainingArguments(
output_dir="mt5-finetuned",
num_train_epochs=3,
per_device_train_batch_size=32, ## lower batch sizes
per_device_eval_batch_size=32, ## lower batch sizes
evaluation_strategy="epoch",
learning_rate=5e-4,
weight_decay=0.01,
save_total_limit=3,
# fp16=True, ## lower precision
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset,#["train"],
eval_dataset=dataset,#["validation"],
)
trainer.train()
But I get this error:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Thank everyone in advance