Urgent: I am training a Whisper Small model on a custom dataset to transcribe Swahili audio to Swahili text. However, training decreases the validation loss but increases WER regardless of the number of epochs. I have added snapshots of the outputs. I have utilized several different combinations of hyperparams, processings but nothing works. I have also utilized a pretrained Swahili ASR model trained on commonvoice 11. Although the 4th epoch shows a decrease in WER it increases in the 5th. Not much changes occur even after 15 epochs.