Hi everyone,
I’m a bit lost and could use some help.
I’m trying to fine-tune Gemma-2B-4bit (from Unsloth) with LoRA while incorporating new special tokens. However, after multiple training steps, I noticed that:
- The embeddings of the newly added tokens remain unchanged
- The model never generates any of the added special tokens
This makes me wonder:
- Should I set
embedding_layer.weight.requires_grad = True
for the entire embedding layer, or just for the newly added embeddings? - Even if I set it, is there anything else I need to ensure to make sure the special token embeddings actually update?
Any guidance or best practices for training with new tokens in a 4-bit quantized setup would be greatly appreciated!
Thanks in advance!