This was an issue a while back but seems to have resurfaced - T5 fp16 issue is fixed
I have tested the exact following code on t5-small
and t5-base
and they work fine. However, when using t5-large
and/or flan-t5-xl
, the model produces nan outputs. This is solely a result of using half precision (ignore the multiple GPUs, strategy etc, I have tested with every other variation):
trainer = pl.Trainer(
precision="16",
accelerator='gpu',
strategy='auto',
devices=4,)
I am using transformers == 4.28.1
and lightning == 2.0.0
Any ideas/help appreciated
Thanks!