BitsAndBytesConfig is not compitable in TPU env

No GPU found error is raised. How can I quantize llama in the TPU environment? Here is the code I used.

bnb_config =  BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=False,
    bnb_4bit_quant_type="nf4")

base_model = LlamaForSequenceClassification.from_pretrained(
    CFG.MODEL_NAME,
    num_labels=CFG.NUM_LABELS,
    torch_dtype=torch.bfloat16,
    quantization_config=bnb_config,)
base_model.config.pretraining_tp = 1 
base_model.config.pad_token_id = tokenizer.pad_token_id

Hi,

That’s correct, BitsandBytes quantization only supports CUDA: GitHub - TimDettmers/bitsandbytes: Accessible large language models via k-bit quantization for PyTorch..

You might need to take a look at other quantization methods: Quantization

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.