Quantization Aware Training Error

Hey there, I’m currently finetuning a T5 model and am willing to quantize this model for size reduction and easier deployment.

Everything is working fine besides the fact that my QAT always get cancelled because of an error I cannot explain. I’m currently finetuning my model in Google Colab.

quantization_config = QuantizationAwareTrainingConfig()

model = AutoModelForSeq2SeqLM.from_pretrained("Salesforce/codet5-small")

trainer = INCSeq2SeqTrainer(
    model=model,
    quantization_config=quantization_config,
    args=args,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['validation'],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience=2)]
)

training_results = trainer.train()

while training the model, I receive following error:


AssertionError Traceback (most recent call last)
in <cell line: 1>()
----> 1 training_results = trainer.train()

4 frames
/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py in scale(self, outputs)
164 # Short-circuit for the common case.
165 if isinstance(outputs, torch.Tensor):
→ 166 assert outputs.is_cuda or outputs.device.type == ‘xla’
167 if self._scale is None:
168 self._lazy_init_scale_growth_tracker(outputs.device)

AssertionError:

I can’t seem to find any solution for that. Therefore I’d be thankful for any hint, regarding my problem.