Finetune Llama with PPOTrainer

Dong237 · September 29, 2023, 8:48am

Solution found! In my case keeping the torch_dtype for both the base model and the ppo model as torch.bfloat16 solved the problem.

Example like this:

model = AutoModelForCausalLM.from_pretrained(
    config.model_name, 
    torch_dtype=torch.bfloat16,
    device_map=device_map,
    )

ppo_model = AutoModelForCausalLMWithValueHead.from_pretrained(
    model,
    torch_dtype=torch.bfloat16,
    is_trainable=True
    )

Topic		Replies	Views
[SOLVED] Trying to fine-tune Llama, getting NaN gradients after a single step Models	1	974	August 23, 2024
QLoRA with GPTQ 🤗Transformers	3	2018	September 22, 2024
Finetuning 4bit model Beginners	1	2427	August 29, 2023
Reduced inference f1 score with QLoRA finetuned model Intermediate	1	880	September 6, 2023
Loss.backward() producing nan values with 8-bit Llama-3-70B-Instruct Models	3	756	May 1, 2024

Finetune Llama with PPOTrainer

Related topics