Hi,
I see the below snippet in modeling_t5.py. I wanted to understand why use_cache is incompatible with gradient checkpointing.
Thanks.
Hi,
I see the below snippet in modeling_t5.py. I wanted to understand why use_cache is incompatible with gradient checkpointing.
Thanks.
Hi, there. I face the same problem in run_clm.py when I set --gradient_checkpointing true. However I do not find any config that I can set on run_clm.py, does anyone know?
Anybody knows how to fix this?
Honestly, I’ve just ignored it. It automatically disables the use_cache function. I’m about to remove the warning altogether.
use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False
its related to past_key_values, you can disable this warning by setting
model.config.use_cache = False, when using gradient checkpointing during training.
but during inference make sure to set it back to True.
Could you please share what are the purposes of use_cache? Thanks.