Does load_in_8bit directly load the model in 8bit? (spoliler, do not seem like it)

sd3ntato · July 11, 2023, 1:58pm

when i call AutoModel.from_pretrained(…, load_in_8bit=True) , does transformers library just load a quantized version or does it first load the model as it was saved (tipically 32bit) and then quantize it?

Topic		Replies	Views
Does loading in 4bit override an 8bit model? 🤗Transformers	0	692	October 20, 2023
Does the model load on the memory? Beginners	2	522	January 10, 2024
Load_in_8bit vs. loading 8-bit quantized model 🤗Transformers	6	6624	May 13, 2024
Peft model from pretrained load in 8/4 bit 🤗Transformers	6	17527	October 12, 2023
Enabling load_in_8bit makes inference much slower 🤗Transformers	3	1791	February 13, 2024

Does load_in_8bit directly load the model in 8bit? (spoliler, do not seem like it)

Related topics