Mt5 fine-tuning using fp16 yields zero loss

I am facing the issue of not being able to fine-tune mt5-small with fp16. There are a few discussions regarding the issue but there is no concrete fixed or those fixed are not merged into the main branch yet.
Could anyone fine tune any of mT5 variants using fp16?

Im not able to do it with fp32 because it constantly gives me cuda out of memory also my GPU doesn’t support bf16. Any recommendation or solution?
Thanks the community

I have same error too.