BitsAndBytesConfig is not compitable in TPU env

VasudevaK · July 6, 2024, 7:21am

No GPU found error is raised. How can I quantize llama in the TPU environment? Here is the code I used.

bnb_config =  BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=False,
    bnb_4bit_quant_type="nf4")

base_model = LlamaForSequenceClassification.from_pretrained(
    CFG.MODEL_NAME,
    num_labels=CFG.NUM_LABELS,
    torch_dtype=torch.bfloat16,
    quantization_config=bnb_config,)
base_model.config.pretraining_tp = 1 
base_model.config.pad_token_id = tokenizer.pad_token_id

nielsr · July 6, 2024, 10:29am

Hi,

That’s correct, BitsandBytes quantization only supports CUDA: GitHub - TimDettmers/bitsandbytes: Accessible large language models via k-bit quantization for PyTorch..

You might need to take a look at other quantization methods: Quantization

system · July 6, 2024, 10:29pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
BitsAndBytes With DDP 🤗Transformers	3	89	October 7, 2024
Low bf16 performance on TPU, int4 vs int8 quantizatoin 🤗Accelerate	0	356	June 1, 2024
Error loading tokenizer: data did not match any variant of untagged enum ModelWrapper at line 1251003 column 3 🤗Tokenizers	3	3631	October 10, 2024
Diff between GPTQ and NF4 with bitsandbytes 🤗Transformers	0	1250	August 1, 2023
Meta Llama - CUDA Error Beginners	1	439	January 22, 2025

BitsAndBytesConfig is not compitable in TPU env

Related topics