Hi! I am using GCP v4-16 TPU Pod for training an LLM. I am learning how to do big distributed model training on TPU and I have several issues with using accelerate for TPU:
- Accelerate fails to recognize XLA device, when I run the training script without accelerate I do not have this issue. I have tried setting up config using
accelerate config
and it still fails to find the TPU device. - Does accelerate support multi-worker/node TPU training, similarly to
Multi-GPU
in distributed_type? I might be incorrect here since I haven’t used accelerate before - should I manually assign workers to acelerate or it handles this process by itself?
Thanks!