Every H200 multi-GPU job I launch fails at CUDA initialization, before any model weights load. The error is:
```
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 802: system not yet initialized
```
The failure occurs in vLLM’s `multiproc_executor.py` at `WorkerProc` init. I’ve now tested three different vLLM image versions (CUDA 12.x runtime and CUDA 13 runtime) and the error is identical in all three. It is not model-specific, TP-size-specific, or CUDA-runtime-version-specific.
What I’ve confirmed:
| Setup | Result |
|—|—|
| `pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel` on h200x4, single process (`nvidia-smi` + `torch.cuda.device_count()`) | works, returns 4 |
| `vllm/vllm-openai:v0.19.1` on l4x4 | works end-to-end |
| `vllm/vllm-openai:v0.19.1` on h200x4, Qwen2.5-7B | fails with 802 (twice on retry) |
| `vllm/vllm-openai:v0.19.1` on h200x8, GLM-4.5-Base | fails with 802 |
| `vllm/vllm-openai:cu130-nightly` on h200x4, Qwen2.5-7B | fails with 802 |
The fact that plain PyTorch single-process works on the same h200x4 node but every vLLM multi-process worker fails suggests the issue is specific to how CUDA context is initialized inside spawned worker subprocesses on H200 nodes. This pattern matches Fabric Manager / NVSwitch visibility regressions documented in:
- How do I fix a "system not initialized" error on multi-GPU Droplets? | DigitalOcean Documentation
HF Jobs users can’t restart Fabric Manager or check FM/driver version match.
**Details:**
- Flavors: h200x8 and h200x4 (both fail)
- Host driver (confirmed via `nvidia-smi` inside h200x4 container): NVIDIA 580.126.09, CUDA 13.0, 4× H200 @ 143771 MiB
- Job IDs:
- `elenaajayi/69e5aa28ac288e522d8f0179` (h200x8, GLM-4.5-Base, v0.19.1)
- `elenaajayi/69e5ab1dac288e522d8f017d` (h200x4, Qwen2.5-7B, v0.19.1)
- `elenaajayi/69e5ac7eac288e522d8f0181` (h200x4, Qwen2.5-7B, v0.19.1, retry)
- `elenaajayi/69e61257ac288e522d8f0281` (h200x4, Qwen2.5-7B, cu130-nightly)
- Controls:
- `elenaajayi/69e5a714ac288e522d8f0177` (l4x4, same image, runs clean)
- `elenaajayi/69e5be88cd8c002f31dffddc` (h200x4, plain PyTorch, nvidia-smi + device_count() succeed)
- Docker images tested: `vllm/vllm-openai:v0.19.1`, `vllm/vllm-openai:cu130-nightly`, `pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel`
- `huggingface_hub`: 0.26.2
Is the HF infrastructure team aware of this? Is there a timeline for a fix, or an alternative H200 flavor I can try? This is blocking a NeurIPS paper run