Accelerate TPU training

Hi! I am using GCP v4-16 TPU Pod for training an LLM. I am learning how to do big distributed model training on TPU and I have several issues with using accelerate for TPU:

  1. Accelerate fails to recognize XLA device, when I run the training script without accelerate I do not have this issue. I have tried setting up config using accelerate config and it still fails to find the TPU device.
  2. Does accelerate support multi-worker/node TPU training, similarly to Multi-GPU in distributed_type? I might be incorrect here since I haven’t used accelerate before - should I manually assign workers to acelerate or it handles this process by itself?

Thanks!