I have an int8 model that I saved with .save_pretrained(). When trying to load the model with .from_pretrained() I get error “RuntimeError: Only Tensors of floating point and complex dtype can require gradients”.
I tried
with torch.no_grad()
model = AutoModelForCausalLM.from_pretrained("./model-8bit", device_map="auto")
but to no avail.
Any advice would be appreciated