Hello,
we plan on creating examples for model parallel training that support many more model architectures than and is more modular than the current example (transformers/examples/research_projects/jax-projects/model_parallel at main · huggingface/transformers · GitHub) . We would like the outcome of this project to be a series of articles/blogposts that detail model-parallel and data-parallel training of HuggingFace transformers on TPU’s/GPU’s. It would be great if this project could be united with other projects that aim to pretrain a large model