Device_map="auto" in MIG Instance


I’ve recently taken an interest in LLM and have been experimenting with various things. I’ve got access to A100 instances, so I’m trying to serve the Llama2 70B chat hf model. I’m following the tutorial here,

but I’m encountering an error that says there’s not enough GPU memory.

My instance has 4 A100 80G cards, and I’ve set it up with mig 7g.80GB configuration. This means that each mig instance uses the full resources of one GPU. (It might seem odd, but there were some constraints with the support I received). So, I don’t think memory should be an issue if I distribute the 70B model, but it seems like it’s trying to use only the first GPU.

While googling, I found that communication between mig instances might not be possible. Could this be related to the issue?
(I did try changing the ddp backend from nccl to gloo for distributed inference in the torchrun code, as nccl doesn’t seem to support mig.)

Does anyone have experience using multiple GPUs with mig instances?

code is here

      self.model = AutoModelForCausalLM.from_pretrained(