Hi! I am trying to optimize inference on a MarianMTModel, following the huggingface guide for optimizing inference on single GPU. Upon trying to run as a mixed-int8 model
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint, load_in_8bit=True).to(device)
I am prompted to provide a device map
ValueError: A device map needs to be passed to run convert models into mixed-int8 format. Please run`.from_pretrained` with `device_map='auto'`
Adding the device_map
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint, device_map="auto", load_in_8bit=True).to(device)
results in the following
ValueError: MarianMTModel does not support `device_map='auto'` yet.
I have tried recreating my Conda environment from, and still get the same problems. Has anyone had similar issues?
I am using PyTorch 1.12.0, Accelerate 0.15.0, bitsandbytes 0.35.4, transformers 4.24.0