Hi,
I see the below snippet in modeling_t5.py. I wanted to understand why use_cache is incompatible with gradient checkpointing.
Thanks.
Hi,
I see the below snippet in modeling_t5.py. I wanted to understand why use_cache is incompatible with gradient checkpointing.
Thanks.
Hi, there. I face the same problem in run_clm.py
when I set --gradient_checkpointing true
. However I do not find any config that I can set on run_clm.py
, does anyone know?
Anybody knows how to fix this?
Honestly, I’ve just ignored it. It automatically disables the use_cache function. I’m about to remove the warning altogether.
use_cache=True
is incompatible with gradient checkpointing. Setting use_cache=False