I would like to finetune a MT0 model with fp16, int8 or int4. However, the loss is always 0 because of nan
. Wondering how to fix this 0 loss issue ?
[INFO|trainer.py:327] 2023-08-26 21:05:39,168 >> {'loss': 3.3959, 'learning_rate': 9.993190040434134e-07, 'train_runtime': 14.3111, 'train_samples_per_second': 8.944, 'train_num_samples_consumed': 128, 'job_progress': 0.0006809959565865078, 'epoch': 0.0} [INFO|trainer.py:327] 2023-08-26 21:05:52,255 >> {'loss': 0.0, 'learning_rate': 9.986380080868269e-07, 'train_runtime': 13.0882, 'train_samples_per_second': 9.78, 'train_num_samples_consumed': 256, 'job_progress': 0.0013619919131730156, 'epoch': 0.01} [INFO|trainer.py:327] 2023-08-26 21:06:05,317 >> {'loss': 0.0, 'learning_rate': 9.979570121302404e-07, 'train_runtime': 13.0619, 'train_samples_per_second': 9.799, 'train_num_samples_consumed': 384, 'job_progress': 0.002042987869759523, 'epoch': 0.01}