[Open-to-the-community] Community week using JAX/Flax for NLP & CV :jax:

training GPT2 in Bengali would be pretty huge for Bengali NLP research community.

Here’s the topic link: PreTrain GPT2 from scratch in Bengali


Unfortunately I will not be able to attend the talks from 30/06 - 02/07. Will they be recorded and made available?

I think so! (@Suzana might know better)


Hey @patrickvonplaten,

Would it be possible to train an mBART model from scratch in JAX/Flax?

Maybe only for a couple of languages, to fit the time frame.

Hey @bhavnicksm

Sure, why not!

mBART will be merged soon in JAX/Flax, but if you want to train from scratch you could also use BART or T5.

And yeah, starting with a few languages makes sense to fit the time frame.

Can i train Wav2Vec2 from JAX?

Is there code for BART pre-training using huggingface?

No, we haven’t yet added BART pre-training script. T5 pre-training script should be available in week.

But if someone wants, feel free to take a shot at yet. The most important part is the bart denosing function. Then one could just leverage the run_summarization_script` with the denoising dataset to pre-train BART

Patrick is working on FlaxWav2vec2, but it will take some time since it’s a complex model and pre-training is also a bit complex.


Super excited to get some formal training on jax. Been trying to get started with jax for many weeks now but lack of motivation and busy work schedule prevented it. Looks like at least one problem is solved! Btw if I only want to learn jax and not have to work on the project would that be okay? Curious because it might not fit into the schedule.


Sure, try to take in as much as possible during the event!

Hi @Suzana, please share here if you have any updates on this. Thank you!

Wow, cool initiative. Is it late to participate? (sent the google form yesterday)

Nope, not at all, you could still join!

Sound cool, submitted the form :grin:

Awesome, now you could explore the project ideas and leave a comment there if you want to join that project :slight_smile:

Oh, I thought there’s will be a slack channel, isn’t it?

Can I train a GPT2 base for Romanian language using a 24 GB Romanian text dataset :smiley: ?

Yes, there’s a slack channel as well, you’ll be invited to it by tomorrow. But we use forum to post the ideas to make them visible to everyone.


Hi, I submitted the form, but haven’t received any slack channel invitation. Is this normal (am I too late?)