ValueError fp16 lm_head.weight

I am trying to run run_translation.py with mt5-large and DeepSpeed enabled. I use ds_config_zero3.json as the config file. However, when I try to run this, I get the following error:

ValueError: fp16 is enabled but the following parameters have dtype that is not fp16: lm_head.weight

Is there some config setting I’m missing that could help resolve this issue?

Hey did you figure out to resolve this? I’d be interested to learn what you did.

I ran the ASR example here, and it ran fine, but I noticed it had fp16 set to false. If I try to save memory by passing --fp16 at the command line or manually invoking fp16=True when calling TrainingArguments, I get the same error you report.