Hey,
I am trying to fine tune llama using the transformers library. I noticed when I set gradient_checkpointing=True in the trainer args i get the following error:
Expects BACKWARD_PRE
or BACKWARD_POST
state but got HandleTrainingState.FORWARD
Has anyone come across this before?
@sgugger I have seen you respond to gradient_checkpointing questions before, so thought I would tag you