Why does hugging face falcon model use mode.config.use_cache = False, why wouldn't it want to have the decoder re-use computations for fine-tuning?

0sunfire0 · July 7, 2023, 9:53pm

Nope the model was downloaded and loaded in memory with bf16 to save vram, but after that you can change it to whatever you want, i download it in torch.float16, cause free gpu doesn’t support bf16

Topic		Replies	Views
Why does the falcon QLoRA tutorial code use eos_token as pad_token? Models	19	7816	January 17, 2024
Saving Fine-tune Falcon Model Beginners	0	38	July 15, 2024
Unable to load fine-tuned llm Beginners	4	3276	January 31, 2024
Resolving "Cannot Perform Fine-Tuning on Purely Quantized Models" Error in Falcon Model Training? 🤗Transformers	4	9340	May 9, 2025
Fine Tuning Falcon7B with QLora 🤗Transformers	5	1084	November 21, 2023

Why does hugging face falcon model use mode.config.use_cache = False, why wouldn't it want to have the decoder re-use computations for fine-tuning?

Related topics