Using gradient_checkpointing=True in Trainer causes error with LLaMA

tsheasha · July 8, 2023, 11:25pm

Hey,

I am trying to fine tune llama using the transformers library. I noticed when I set gradient_checkpointing=True in the trainer args i get the following error:

Expects BACKWARD_PRE or BACKWARD_POST state but got HandleTrainingState.FORWARD

Has anyone come across this before?

@sgugger I have seen you respond to gradient_checkpointing questions before, so thought I would tag you

tsheasha · July 8, 2023, 11:58pm

This happens when i have set use_reentrant=False in the call to torch.checkpoint, when i set use_reentrant=True, i get an error when torch.compile() runs on the model

Topic		Replies	Views
Gradient checkpointing without training Beginners	0	239	July 18, 2023
Transformers Trainer + Accelerate FSDP: How do I load my model from a checkpoint? 🤗Accelerate	3	14424	June 22, 2025
SFTTrainer checkpointing 🤗Transformers	6	6310	January 21, 2024
Cannot resume trainer from checkpoint 🤗Transformers	2	1388	May 5, 2023
Checkpoint-500 not being generated for llama-7b fine tuning Beginners	2	254	July 24, 2023

Using gradient_checkpointing=True in Trainer causes error with LLaMA

Related topics