Repeatedly decoding tokens multiple times after PEFT fine-tuning whisper

Hi, first, thank you for open-sourcing such a good ASR project. Recently I plan to investigate whisper in my research. LoRA was applied to parameter-efficient fine-tuned on my dataset (30h Chinese Mandarin speech corpus). Before fine-tuning, the whisper can achieve about 10% WER. However, after fine-tuning, its decoding seems to have some problems that repeatedly output some tokens multiple times.

It looks like this:

Below are some of my code snippets for configuration. batch_size =2, num_train_epochs =3, fp16 =true.

        # lora config
        config = LoraConfig(r=32, lora_alpha=64, target_modules=["q_proj", "v_proj"], lora_dropout=0.05, bias="none")
        # training_args
        training_args = Seq2SeqTrainingArguments(
            output_dir=args.output_dir,  # change to a repo name of your choice
            gradient_accumulation_steps=16//batch_size,  # increase by 2x for every 2x decrease in batch size
            eval_accumulation_steps=1, # otherwise will accumulate in GPU, OOM warning!
            remove_unused_columns=False,  # required as the PeftModel forward doesn't have the signature of the wrapped model's forward
            label_names=["labels"],  # same reason as above

It would be much appreciated if anyone has any idea about this issue. Or please let me know if you need any more info/clues, . Thanks!

I just found that I missed adding <|endoftext|> at the end of each sentence as I use the tokenizer with the argument add_special_tokens=False. Without this special end token, the model will not know when/where to end after fine-tuning. Now, it works as normal.

Please, can you explain me how did you solve this problem ? I am with the same. You just add this add_special_tokens as True in your training ?
Thanks a lot =)