when i call AutoModel.from_pretrained(…, load_in_8bit=True) , does transformers library just load a quantized version or does it first load the model as it was saved (tipically 32bit) and then quantize it?
when i call AutoModel.from_pretrained(…, load_in_8bit=True) , does transformers library just load a quantized version or does it first load the model as it was saved (tipically 32bit) and then quantize it?