No benefit from turning on gradient_checkpointing: True

I was LORA finetuning LLama 70b model and I turned on gradient_checkpointing: True in my training config but it has no affect on the memory consumption at all and I don’t see any affect of the flag where I put it False or true. Any idea why that will the case ?