Why is use_cache incompatible with gradient checkpointing?

I see the below snippet in modeling_t5.py. I wanted to understand why use_cache is incompatible with gradient checkpointing.


Hi, there. I face the same problem in run_clm.py when I set --gradient_checkpointing true. However I do not find any config that I can set on run_clm.py, does anyone know?

Anybody knows how to fix this?

Honestly, I’ve just ignored it. It automatically disables the use_cache function. I’m about to remove the warning altogether.

use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False

its related to past_key_values, you can disable this warning by setting
model.config.use_cache = False, when using gradient checkpointing during training.
but during inference make sure to set it back to True.

1 Like

Could you please share what are the purposes of use_cache? Thanks.

1 Like