Finetuning llama-vision-3.2-instruct

Hello everyone!

I’m trying to finetune the Llama-vison-3.2-instruct model. I want to only train the multimodal_projector which is a linear layer. But I’m seeing when I only set those parameters to requires.grad = True the loss.requires_grad = False. This means for someone the loss is not getting back-propagated to the multimodal projection weights.

I can forcefully set loss.required_grad = True and train but that seems very hacky.

Has anyone else faced this?

Thank you for your time.

1 Like

I wonder if that’s it?

Yes but I’m wondering why I have to manually loss.requires_grad? It’s typically not the case. For eg: It’s done in the tutorial here: - Fine-tune a pretrained model.

Thank you!

1 Like