T5-xxl mlm distributed training?

Hi there,
I want to train T5-XXL on a domain corpus (unsupervised, masked training). I’ve got 2 GPU á 40GB at hand, so I need to make use of Acclerator and Deepspeed.
The example script run_t5_mlm_flax.py from the flax-examples does not support Accelerator :frowning:

The script run_mlm_no_trainer.py from the pytorch-examples supports Accelerator but not the T5’s particular training mode.

Is it true that huggingface’s Flax support is in “maintenance mode”?

thanks :slight_smile:
Daniel

I think accelerator is designed for toch not flax.