[Open-to-the-community] Community week using JAX/Flax for NLP & CV :jax:

That should work! We are also working on datasets streaming for very large datasets, see PR here: https://github.com/huggingface/datasets/pull/2375 and RoBERTa lange can fit up to a batch_size of 512 or 1024 on a TPUv3-8 for a sequence length of 128 (most of the time one actually starts with just 128 sequence length).

So this is definitely a doable project!

5 Likes