How can I load an LLM in 4-bits

shamanez · August 2, 2023, 8:00am

I am working with this PEFT example of finetuning a model with 4 bits.

Although I use load_in_4bit=True, I get a log message as follows.

Detected 8-bit loading: activating 8-bit loading for this model

Can someone please explain this to me?

My GPU is Tesla T4 16Gb

Topic		Replies	Views
Unable to load LLM with load_in_8bits 🤗Transformers	1	854	May 9, 2023
Can I load a model fine-tuned with LoRA 4-bit quantization as an 8-bit model? 🤗Hub	0	289	November 27, 2023
Peft model from pretrained load in 8/4 bit 🤗Transformers	6	17488	October 12, 2023
Does loading in 4bit override an 8bit model? 🤗Transformers	0	692	October 20, 2023
How to run large LLMs like Llama 3.1 70B or Mixtral 8x22B with limited GPU VRAM? Beginners	2	1613	September 26, 2024