Predicting only " " after training (S2T) Wav2Vec2CTC

I have the same issue, the loss is nan and after 1 epoch the model predicts empty strings
please, have you found the root of the issue? thanks.