Why does hugging face falcon model use mode.config.use_cache = False, why wouldn't it want to have the decoder re-use computations for fine-tuning?

Nope the model was downloaded and loaded in memory with bf16 to save vram, but after that you can change it to whatever you want, i download it in torch.float16, cause free gpu doesn’t support bf16