Hi,
I see the below snippet in modeling_t5.py. I wanted to understand why use_cache is incompatible with gradient checkpointing.
Thanks.
Hi,
I see the below snippet in modeling_t5.py. I wanted to understand why use_cache is incompatible with gradient checkpointing.
Thanks.
Hi, there. I face the same problem in run_clm.py
when I set --gradient_checkpointing true
. However I do not find any config that I can set on run_clm.py
, does anyone know?
Anybody knows how to fix this?
Honestly, I’ve just ignored it. It automatically disables the use_cache function. I’m about to remove the warning altogether.
use_cache=True
is incompatible with gradient checkpointing. Setting use_cache=False
its related to past_key_values
, you can disable this warning by setting
model.config.use_cache = False
, when using gradient checkpointing during training.
but during inference make sure to set it back to True
.
Could you please share what are the purposes of use_cache
? Thanks.