[Open-to-the-community] Community week using JAX/Flax for NLP & CV :jax:

Learn how to use JAX/Flax with Transformers :hugs: + :jax:

We partnered-up with Google’s Flax, JAX, and Cloud teams to organize a new community week from July 7th to July 14th. We want to teach you how to effectively use JAX/Flax for Natural Language Processing (NLP) and Computer Vision (CV).

Free access to a TPUv3-8 VM will kindly be provided by the Google Cloud team :exploding_head:!

We can guarantee TPU access for the first 400 participants, so it might be worth to sign-up quickly :wink:.

TLDR; All important announcements will be made in an internal Slack channel. To get access to the Slack, you can sign-up here. Important information is summarized here :sunglasses: .

What it is about :rocket:

The goal of the JAX/Flax community week is to make compute-intensive NLP and CV projects (like pre-training BERT, GPT2, CLIP, ViT) practicable for a wider audience of engineers and researchers.

To do so, we will teach participants how to effectively use JAX/Flax on TPU and help them define a fun project to complete during the community week.

How does it work :face_with_monocle:

Participants can propose ideas for an interesting NLP and/or CV project. Teams of 2 to 5 will then be formed around the most promising and interesting projects.

Make sure to read this document on how to propose projects, comment on other participants’ project ideas, and create a team.

To help each team to successfully finish their project, we have organized talks by leading scientists and engineers from Google, Hugging Face, and the open-source NLP & CV community. The talks will take place before the community week from June 30th to July 2nd. Make sure to attend the talks to get the most out of your participation!

Each team is then given free access to a TPUv3-8 VM from July 7th to July 14th. In addition, we will provide training examples in JAX/Flax for a variety of NLP and Vision models to kick-start your project. During the week, we’ll make sure to answer any questions you might have about JAX/Flax and Transformers. We will try to help each team as much as possible to successfully complete their project!

At the end of the community week, each team can submit a demo of their project. All demonstrations will be evaluated by a jury and the top-3 demos will be awarded a prize! In a few days, we will share a document that gives a couple of propositions on how to submit your project with a nice demo.

What do I need to do to participate :wave:

An interest in JAX/Flax and ideally an idea for a fun NLP/CV project! Compute in the form of a TPUv3-8 will kindly be provided by Google’s Cloud team. To sign up for the event, please use this Google form to fill in your name, email address as well as your Hugging Face account name. You will then be added to an internal Slack channel in which all the relevant information will be given.
An overview of the event is shown here.

What do I get :gift:

  • enjoy a bit of Hugging Face vibe by joining the JAX/Flax community week.

  • learn about JAX/Flax and how to effectively use it for compute-intensive training.

  • interesting presentations by leading researchers and engineers in NLP & CV.

  • a prize if you are under the top-3 projects!

Summary - Timeline :calendar:

23.06. - Official announcement of the community week. Make sure to sign-up in this Google form.

23.06. - 30.06. Participants will be added to an internal Slack channel. Project ideas can be proposed here and groups of 2-10 are formed. Read this document for more information.

30.06. - Release of all relevant training scripts in JAX/Flax as well as other documents on how to set up a TPU, how to use the training scripts, how to submit a demo, tips & tricks for JAX/Flax, tips & tricks for efficient use of the :câlin: hub.

30.06. - 2.07. Talks about JAX/Flax, TPU, Transformers, Computer Vision & NLP will be held.

7.07. Start of the community week! Access to TPUv3-8 will be given to each team.

7.07. - 14.07. - The Hugging Face & JAX/Flax & Cloud team will be available for any questions, problems the teams might run into.

15.07. Access to TPU is deactivated and community week officially ends.

16.07. Deadline for each team to submit a demo.

Open-sourcely yours,
The :hugs:, :jax:, Flax and Cloud team


Really cool Patrick! @thomasdehaene might be interested to pre-train on Dutch data.


I am in! Great as usual! I would like to create a Spanish GPT-2 w/ Spanish OSCAR Corpus @patrickvonplaten


Really amazing. :grinning: Also not able to access the Google forms.


Thanks for reporting. Should be accessible to everyone now :slight_smile:


This sounds great! Feel free to create a project idea with more details :slight_smile: This section should explain in detail how to propose and join projects :slight_smile:

Awesome! Feel free to post in this category as explained here

1 Like

This sounds awesome! Whats the expected experience level? I have some experience with CV in other frameworks, but nothing about huggingface/ NLP. Is this also for me?I have already signed up anyways!


Sure! We try to make transformers and datasets as easy to understand as possible and provide multiple examples of how to use transformers with JAX here . Also we will be answering as many questions as possible during the week, so I’m sure this event can be interesting for you :slight_smile:

Would training a RoBERTa large model, first for 256 max length and then for 512, on the Spanish portion of mC4 be too much for this event?

1 Like

Maybe not too much. Since as we have seen in past research, large models actually converge much faster. It’s a good idea to start with a 256 max length!

This also depends upon how much data you want to use though.

That’s one of the issues. The Spanish portion of mC4 seems to be 1TB of uncompressed text. For sure not all of it is needed, but it’d be great to be able to train on at least half or a third of it.

That should work! We are also working on datasets streaming for very large datasets, see PR here: https://github.com/huggingface/datasets/pull/2375 and RoBERTa lange can fit up to a batch_size of 512 or 1024 on a TPUv3-8 for a sequence length of 128 (most of the time one actually starts with just 128 sequence length).

So this is definitely a doable project!


That’s really cool! mC4 can be painful depending on what languages you are dealing with. Streaming will be great for this.

Would it be viable to pre-train ALBERT or DeBERTa-v2 models in this event? Hard to make a decision on which one!

We haven’t yet added ALBERT and DeBERTa-v2 in Flax yet. ALBERT should be easy to add though.

ELECTRA (which is already available in Flax) is also a good option since it’s much more sample efficient. But we don’t have a pre-training script for electra yet.

1 Like

This is the opportunity we were waiting for! Count me in @mrm8488 :hugs:


Great! Created some of my project ideas here.

  1. PreTrain GPT2 from scratch in Bengali
  2. PreTrain T5 from scratch in Bengali
  3. PreTrain RoBERTa (MLM model) from scratch for Programming Languages

Not sure what is the condition of the T5 pre-training script. Would love to contribute and adapt the given MLM and CLM script to T5 if it’s not done yet.


Awesome can’t wait for Jax/flax huggingface + tpus​:partying_face::partying_face::partying_face::partying_face::partying_face::partying_face::partying_face::heart::fire::fire::fire::fire::fire::man_dancing::man_dancing::man_dancing: already working on japanese based text classification using Bert pre trained transformers

1 Like

T5 pretraining script should be merged by next week :slight_smile: