PreTrain GPT2 from scratch in Punjabi

GPT2 for Punjabi

Pretrain GPT2 on Punjabi language to create a strong language generation model for Punjbai

Model

A randomly initialized GPT2 model

Datasets

One can make use of Kaggle Wikipedia Punjabi Articles dataset - Punjabi Wikipedia Articles | Kaggle

Available training scripts

A causal language modeling script for Flax is available here.

(Optional) Desired project outcome

The desired project output is a GPT2 model that is able to generate Punjabi language.

2 Likes

Hey @hgarg! :hugs:

Great to see someone working on Punjabi. There is definitly a lack of models for Punjabi in the HuggingFace Hub, so this can definitly supplement that.

Great idea and all the best!

1 Like

Cool idea @bhavnicksm - would you like to join? :slight_smile: