Finetune Llama with PPOTrainer

Harshvir · July 21, 2023, 9:22pm

I am trying to finetune Llama with PPOTrainer class of TRL, a similar tutorial is used to finetune gpt2 on IMDB dataset.
But I keep getting this error when logging to wandb - ValueError: autodetected range of [nan, nan] is not finite

Also many ppo related values such as ‘ppo/loss/policy’,‘ppo/loss/value’, ‘ppo/loss/total’, ‘ppo/policy/entropy’, etc are nan values.
Refer this notebook(a copy of the tutorial notebook but with a different model) for the error

Dong237 · September 28, 2023, 8:51am

Hi Harshvir! I am encountering the exactly same situation while I am testing with a small gpt-neo-x model, did you already solve this problem? I would appreciate it very much if you could share the solution! Thanks!

Dong237 · September 29, 2023, 8:48am

Solution found! In my case keeping the torch_dtype for both the base model and the ppo model as torch.bfloat16 solved the problem.

Example like this:

model = AutoModelForCausalLM.from_pretrained(
    config.model_name, 
    torch_dtype=torch.bfloat16,
    device_map=device_map,
    )

ppo_model = AutoModelForCausalLMWithValueHead.from_pretrained(
    model,
    torch_dtype=torch.bfloat16,
    is_trainable=True
    )

Topic		Replies	Views
[SOLVED] Trying to fine-tune Llama, getting NaN gradients after a single step Models	1	1002	August 23, 2024
QLoRA with GPTQ 🤗Transformers	3	2025	September 22, 2024
Finetuning 4bit model Beginners	1	2427	August 29, 2023
Reduced inference f1 score with QLoRA finetuned model Intermediate	1	881	September 6, 2023
Loss.backward() producing nan values with 8-bit Llama-3-70B-Instruct Models	3	763	May 1, 2024

Finetune Llama with PPOTrainer

Related topics