Is it possible to use deepspeed inference with a 4/8-bit quantized model using bitsandbytes? I use the bitsandbytes package like this: nf4_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16 ) …

Deepspeed inference and infinity offload with bitsandbytes 4bit loaded models

AvivB July 27, 2023, 8:46pm 3

I am getting this error too.

Edit: There appears to be support now: