LongT5 fine-tunning

See here for why it failed: Mixed precision for bfloat16-pretrained models

tldr: t5 was trained with bf16 and you fine tuned in fp16. bf16 has a much larger range than fp16 so those large values (bf16) turned into nan (fp16)