Develop robust examples for model parallel training on TPU's

we plan on creating examples for model parallel training that support many more model architectures than and is more modular than the current example (transformers/examples/research_projects/jax-projects/model_parallel at main · huggingface/transformers · GitHub) . We would like the outcome of this project to be a series of articles/blogposts that detail model-parallel and data-parallel training of HuggingFace transformers on TPU’s/GPU’s. It would be great if this project could be united with other projects that aim to pretrain a large model