There are a few minor nuances that are important during fine-tuing/inference of LLMs. I am experimenting with Gemma-2 and Phi-3 and here are the little things that I don’t quiet get:
-
Some LLM do not have pad token. Does it matter if i change set it as unknown or eos?
tokenizer.pad_token = tokenizer.eos_token tokenizer.add_eos_token = True tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids(tokenizer.pad_token)
or just
tokenizer.pad_token = tokenizer.eos_token
-
Does the padding size matter? Here it states that " Since LLMs are not trained to continue from pad tokens, your input needs to be left-padded", which I do not get, because if we pad from the left we will have pad tokens then the text, so the model have to continue from pad. Also during fine-tuning with flash_attention_2 it throws a warning to put padding_side=‘right’.
-
Does it makes sense to use in context learning after SFT?
I will update the list.