Fine-tuning MT5 - base and make it more ChatGPT like

I am trying to fine-tune a model which works like ChatGPT for Punjabi language, using the mt5-base, however I am not sure if I should go ahead with it since it does not even generate text and when I try to use it, I just get a response as <extra_pad> 0. I have checked the tokenizers, they work fine with Punjabi language, can anyone please tell how may I go on about it?

The dataset I will be using is an instruction following dataset in the format of alpaca and is of high quality.

I have tried fine-tuning indic-gpt before, however it has a very small token size i.e.1024 so I changed my base model.

Thanks in advance!

Is this dataset public ?
Can you share some details ?

Yes, it is available here japneets/Alpaca_instruction_fine_tune_Punjabi_small · Datasets at Hugging Face

1 Like