SpeechT5 Text to Speech fine tuning runtime error

I am fine tuning SpeechT5 for new language by my own dataset.
I did all the steps successfully, but I got the following error when i tried to run
trainer.train()

RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 16 but got size 256 for tensor number 1 in the list.
Anyone help me?

did you find a fix to this or a work around?