No GPU found error is raised
. How can I quantize llama in the TPU environment? Here is the code I used.
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=False,
bnb_4bit_quant_type="nf4")
base_model = LlamaForSequenceClassification.from_pretrained(
CFG.MODEL_NAME,
num_labels=CFG.NUM_LABELS,
torch_dtype=torch.bfloat16,
quantization_config=bnb_config,)
base_model.config.pretraining_tp = 1
base_model.config.pad_token_id = tokenizer.pad_token_id