It’s getting weirder and weirder…
The only thing that’s unusual is that the CUDA version is much older than the 12.4 that is common in HF, but I think it works even if it’s older…
And if there’s not enough VRAM, it should offload properly if accelerate is installed.
I think trying to load some unrelated model into the GPU might help isolate the problem.