Device_map="auto" in MIG Instance

nick-bae · January 23, 2024, 4:55pm

Hello!

I’ve recently taken an interest in LLM and have been experimenting with various things. I’ve got access to A100 instances, so I’m trying to serve the Llama2 70B chat hf model. I’m following the tutorial here,

but I’m encountering an error that says there’s not enough GPU memory.

My instance has 4 A100 80G cards, and I’ve set it up with mig 7g.80GB configuration. This means that each mig instance uses the full resources of one GPU. (It might seem odd, but there were some constraints with the support I received). So, I don’t think memory should be an issue if I distribute the 70B model, but it seems like it’s trying to use only the first GPU.

While googling, I found that communication between mig instances might not be possible. Could this be related to the issue?
(I did try changing the ddp backend from nccl to gloo for distributed inference in the torchrun code, as nccl doesn’t seem to support mig.)

Does anyone have experience using multiple GPUs with mig instances?

code is here

      self.model = AutoModelForCausalLM.from_pretrained(
            model_path,
            device_map="auto",
            torch_dtype=torch.float16,
            load_in_8bit=True,
            trust_remote_code=True,
        )

Topic		Replies	Views
[SOLVED] What's the right way to do GPU paralellism for inference (not training) on AutoModelForCausalLM? 🤗Transformers	1	222	August 26, 2024
Load a large model to multipe, specific GPUs (without CUDA_VISIBLE_DEVICES) 🤗Transformers	0	162	November 22, 2024
Llama 3.1 8b Instruct - Memory Usage More than Reported Models	5	433	February 18, 2025
Multi-GPU LLM inference data parallelism (llama) Beginners	1	14081	October 25, 2023
Using 2 GPUs out of 4 Beginners	0	274	February 28, 2024

Device_map="auto" in MIG Instance

Related topics