Gradient_checkpointing control

Kenkentron · August 10, 2023, 1:05am

Hi community,

I have a basic intuition about how gradient_checkpointing works which only saves activations at some layers and recompute those are not saved during backprop in exchange of memory efficiency.

I wonder how gradient_checkpointing is actually being handled under the hood and if there is any way to control its behavior such as if I want to reduce the extent of checkpointing etc.

Thanks!

Topic		Replies	Views
Gradient checkpointing without training Beginners	0	239	July 18, 2023
No benefit from turning on gradient_checkpointing: True 🤗Transformers	1	178	October 24, 2024
Freezing layers when using gradient checkpointing 🤗Transformers	0	712	March 20, 2022
Accuracy drops using Gradient checkpointing 🤗Transformers	0	153	September 7, 2023
Gradient checkpointing + FSDP 🤗Accelerate	1	2671	August 22, 2023

Gradient_checkpointing control

Related topics