Hi,
I am Fine-tuning an mt5-small model on a custom dataset, for a downstream task.
The model loss appears to be converging but the only issue is that the output from model.generate()
always starts with a specific token:
eg: a
. For each input , …
I’m getting the outputs: “a <Generated Output for sentence 1>
”, “a <Generated Output for sentence 2>
…”
I’m using the beam search strategy for decoding, have tried with different beam widths and other parameters (like repeat_penalty etc) but the output from generate()
always starts with the same token a
Is there any known reason why this might be happening?