Load a large model to multipe, specific GPUs (without CUDA_VISIBLE_DEVICES)

Hi,

So I need to load multiple large models in a single script and control which GPUs they are kept on. For example, lets say I want to load one LLM on the first 4 GPUs and the another LLM on the last 4 GPUs. If I pass “auto” to the device_map, it will always use all GPUs. I cannot use CUDA_VISIBLE_DEVICES since I need all of them to be visible in the script.

For example, what would be the correct arguemnt to device map to load LLama 3.1 on GPUs 0,1,2,3?

    pipeline = transformers.pipeline(
        "text-generation",
        model="meta-llama/Llama-3.1-70B-Instruct",
        model_kwargs={"torch_dtype": torch.bfloat16},
        device_map=llm_device,
        token=ACCESS_TOKEN,
    )
1 Like