How to specify the gpu number to load the input during the inference of huggingface pipeline in a multi-gpu setup?

dingusagar · February 18, 2024, 7:07pm

I was successfuly able to load a 34B model into 4 GPUs (Nvidia L4) using the below code.

from transformers import pipeline

pipe = transformers.pipeline(
            "text-generation", #task
            model="abacusai/Smaug-34B-v0.1",
            tokenizer=tokenizer,
            torch_dtype=torch.bfloat16,
            trust_remote_code=True,
            device_map="auto",
            temperature=0.0001,
            repetition_penalty=1.1,
            # device=0,
            eos_token_id=tokenizer.eos_token_id,
            return_full_text=False
        )
input_prompt = " <--my input prompt-->"
output = pipe(input_prompt)

But because my prompt is little big, i am getting the CUDA Out of Memory Exception during the Inference.

Interesting thing is I can see that my 4th GPU has enough space (around 5.5GB free VRAM) to load the input. But because it is trying to load the input to GPU 1, it is giving this exception.
Is there any way to specify the target GPU for the inputs during the inference ? If not, how else should I tackle this issue of not using the resources completely?

Exception :
OutOfMemoryError: CUDA out of memory. Tried to allocate 1.31 GiB. GPU 1 has a total capacty of 21.96 GiB of which 2.88 MiB is free. Including non-PyTorch memory, this process has 21.95 GiB memory in use. Of the allocated memory 20.35 GiB is allocated by PyTorch, and 1.37 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

borito1907 · May 24, 2024, 6:25pm

I’m experiencing the same issue.

amy-g-dala · August 8, 2024, 7:22pm

I’ve hit this as well, would be interested to find workarounds.

Topic		Replies	Views
Getting error when running inference in multiple GPUs 🤗Transformers	0	659	October 13, 2023
Zero Shot Classification using multiGPU 🤗Transformers	1	647	August 14, 2023
Running ASR inference pipeline on multiple GPU's 🤗Transformers	0	138	February 19, 2024
How to All Utilize all GPU's when device="balanced_low_0" in GPU setting 🤗Transformers	1	217	March 28, 2024
How to load model on multiple GPUs for inference? Beginners	0	762	September 28, 2023

How to specify the gpu number to load the input during the inference of huggingface pipeline in a multi-gpu setup?

Related topics