Diff between GPTQ and NF4 with bitsandbytes

RonanMcGovern · August 1, 2023, 3:16pm

What are the key differences between GPTQ and NF4 quantisation with bitsandbytes? Are there reasons to expect advantages with one over the other?

I’ve been running GPTQ versus bitsandbytes with NF4 . See below for some data.

Perplexity Results

fLlama-7B (2GB shards) nf4 bitsandbytes quantisation:

PPL: 8.8, GPU Mem: 4.7 GB, 12.2 toks.

Llama-7B-GPTQ-4bit-128:

PPL: 9.3, GPU Mem: 4.8 GB, 21.4 toks.

fLlama-7B (4GB shards) nf4 bitsandbytes quantisation:

PPL: 8.0, GPU Mem: 8.2 GB, 7.9 toks.

Llama-13B-GPTQ-4bit-128:

PPL: 7.8, GPU Mem: 8.5 GB, 15 toks.

and here is my bnb config:

bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True, #adds speed with minimal loss of quality.
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16,
    )
    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        quantization_config=bnb_config,
        device_map='auto', # for inference use 'auto', for training use device_map={"":0}
        trust_remote_code=True,
        cache_dir=cache_dir)

Topic		Replies	Views
4-bit quantization Intermediate	0	468	November 18, 2023
Model size-quantization tradeoff for local offline inference Intermediate	1	101	February 7, 2025
What is the difference between gptqConfig and bitsAndBytesConfig 🤗Transformers	0	336	August 27, 2023
Whats the difference between QLoRA and autoGPTQ? 🤗Transformers	0	536	August 13, 2023
Why am I not getting the exact output of 4-bit quantization using NF4? Research	3	100	September 30, 2024

Diff between GPTQ and NF4 with bitsandbytes

Perplexity Results

Related topics