How to load a model with from_pretrained() without requiring gradients

I have an int8 model that I saved with .save_pretrained(). When trying to load the model with .from_pretrained() I get error “RuntimeError: Only Tensors of floating point and complex dtype can require gradients”.

I tried

with torch.no_grad()
    model = AutoModelForCausalLM.from_pretrained("./model-8bit", device_map="auto")

but to no avail.

Any advice would be appreciated :slight_smile: