See here for why it failed: Mixed precision for bfloat16-pretrained models
tldr: t5 was trained with bf16 and you fine tuned in fp16. bf16 has a much larger range than fp16 so those large values (bf16) turned into nan (fp16)
See here for why it failed: Mixed precision for bfloat16-pretrained models
tldr: t5 was trained with bf16 and you fine tuned in fp16. bf16 has a much larger range than fp16 so those large values (bf16) turned into nan (fp16)