4bit finetuning LLM: "No inf checks were recorded for this optimizer." If I don't use Abirate/english_quotes

I am trying to practically learn a bit about finetuning LLMs.
I use this Google Colab in the default configuration.
It works fine if I don’t change anything. But once I try to change the dataset I get this error:


AssertionError                            Traceback (most recent call last)

<ipython-input-23-5cbcc1f7f806> in <cell line: 23>()
     21 )
     22 model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
---> 23 trainer.train()

3 frames

/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1537             self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size
   1538         )
-> 1539         return inner_training_loop(
   1540             args=args,
   1541             resume_from_checkpoint=resume_from_checkpoint,

/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in _inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   1886                         optimizer_was_run = scale_before <= scale_after
   1887                     else:
-> 1888                         self.optimizer.step()
   1889                         optimizer_was_run = not self.accelerator.optimizer_step_was_skipped

/usr/local/lib/python3.10/dist-packages/accelerate/optimizer.py in step(self, closure)
    131                     self._last_scale = self.scaler.get_scale()
    132                     new_scale = True
--> 133                 self.scaler.step(self.optimizer, closure)
    134                 self.scaler.update()
    135                 scale_after = self.scaler.get_scale()

/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py in step(self, optimizer, *args, **kwargs)
    370             self.unscale_(optimizer)
--> 372         assert len(optimizer_state["found_inf_per_device"]) > 0, "No inf checks were recorded for this optimizer."
    374         retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs)

AssertionError: No inf checks were recorded for this optimizer.

How to get the colab to work with custom datasets? Either uploaded directly or uploaded to huggingface.

So something must be wrong with the dataset, right?
Well that is what I thought as well, but even if I literally upload the jsonl file from huggingface Abirate/english_quotes to my own huggingface it will still give me the same error. The data is literally the same. I don’t know how to debug this further. Please help.

I removed --mixed_precision=“fp16” from training script and it removed my error.

Thank you very much. removing the “fp16” worked for me too…