Special tokens & Embeddings requries grad?

Hi everyone,

I’m a bit lost and could use some help.

I’m trying to fine-tune Gemma-2B-4bit (from Unsloth) with LoRA while incorporating new special tokens. However, after multiple training steps, I noticed that:

  1. The embeddings of the newly added tokens remain unchanged
  2. The model never generates any of the added special tokens

This makes me wonder:

  • Should I set embedding_layer.weight.requires_grad = True for the entire embedding layer, or just for the newly added embeddings?
  • Even if I set it, is there anything else I need to ensure to make sure the special token embeddings actually update?

Any guidance or best practices for training with new tokens in a 4-bit quantized setup would be greatly appreciated!

Thanks in advance!

1 Like

For those who are interested , you can provide additional target_modules in lora_config for new embeddings and updated lm_head.

As follows :
config = LoraConfig(…, target_modules=[“embed_tokens”, “lm_head”, “q_proj”, “v_proj”])

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.