Hi,
I am Fine-tuning an mt5-small model on a custom dataset, for a downstream task.
The model loss appears to be converging but the only issue is that the output from model.generate() always starts with a specific token:
eg: a. For each input , …
I’m getting the outputs: “a <Generated Output for sentence 1>”, “a <Generated Output for sentence 2>…”
I’m using the beam search strategy for decoding, have tried with different beam widths and other parameters (like repeat_penalty etc) but the output from generate() always starts with the same token a
Is there any known reason why this might be happening?