Gemma-2 & Phi-3 SFT nuances

preivey · September 18, 2024, 5:51pm

There are a few minor nuances that are important during fine-tuing/inference of LLMs. I am experimenting with Gemma-2 and Phi-3 and here are the little things that I don’t quiet get:

Some LLM do not have pad token. Does it matter if i change set it as unknown or eos?

tokenizer.pad_token = tokenizer.eos_token
tokenizer.add_eos_token = True
tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids(tokenizer.pad_token)

or just

tokenizer.pad_token = tokenizer.eos_token

Does the padding size matter? Here it states that " Since LLMs are not trained to continue from pad tokens, your input needs to be left-padded", which I do not get, because if we pad from the left we will have pad tokens then the text, so the model have to continue from pad. Also during fine-tuning with flash_attention_2 it throws a warning to put padding_side=‘right’.
Does it makes sense to use in context learning after SFT?
I will update the list.

Topic		Replies	Views
The effect of padding_side 🤗Transformers	13	15045	May 27, 2025
How to set the Pad Token for meta-llama/Llama-3 Models Models	6	11862	August 29, 2024
Padding Token Missing from LLaMA Models	1	178	April 17, 2025
How to actually use padding in Lllama Tokenizers 🤗Transformers	2	4918	June 16, 2023
Exhaustive list of changes across all touchpoints in the tokenization pipeline of LM training 🤗Transformers	0	288	June 26, 2023

Gemma-2 & Phi-3 SFT nuances

Related topics