[Kaggle] TPUVM doesn't allow setting nprocs > 1

stderr: WARNING:root:Unsupported nprocs (2), ignoring...

It seems setting number of processes during accelerate config > 1 causes a cascade of errors. Here’s the config I’m using. num_processes=1 works without problem:

compute_environment: LOCAL_MACHINE
distributed_type: TPU
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: 'no'
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

To reproduce, simply install accelerate and accelerate test with this config file on a Kaggle TPUVM 3-8.

Does that mean XLA won’t be able to use all 8 cores, or is there something which I’m missing here?

Here’s the full error: E0409 00:24:55.875405434 3803 oauth2_credentials.cc:236] oauth_fetch: UNKNOWN:C-are...

Interesting,

stderr: WARNING:root:Unsupported nprocs (8), ignoring...

Does accelerate simply not have support for multi-core TPUs? :grin:

Also:

E0409 00:58:47.720978217   12750 oauth2_credentials.cc:236]            oauth_fetch: UNKNOWN:C-ares status is not ARES_SUCCESS qtype=A name=metadata.google.internal. is_balancer=0: Domain name not found {grpc_status:2, created_time:"2023-04-09T00:58:47.720960775+00:00"}

- `Accelerate` version: 0.18.0
- Platform: Linux-5.4.88+-x86_64-with-glibc2.2.5
- Python version: 3.8.16
- Numpy version: 1.24.2
- PyTorch version (GPU?): 2.0.0+cu117 (False)