How to use Qwen2-VL on multiple gpus?

Chan-Y · September 26, 2024, 8:39am

I have 4 gpus that I want to run Qwen2 VL models.

model_name="Qwen/Qwen2-VL-2B-Instruct"
model = Qwen2VLForConditionalGeneration.from_pretrained(
          model_name, torch_dtype="auto", device_map="auto"
        )
processor = AutoProcessor.from_pretrained(model_name)

messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "image": file
                },
                {
                    "type": "text",
                    "text": """Describe the image"""
                }
            ]
        }
]
text = processor.apply_chat_template(
            messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
            text=[text],
            images=image_inputs,
            videos=video_inputs,
            padding=True,
            return_tensors="pt",
        )
with torch.no_grad():
    generated_ids = model.generate(**inputs, max_new_tokens=128)

but I always get:

../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [36,0,0], thread: [32,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [36,0,0], thread: [33,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [36,0,0], thread: [34,0,0] Assertion `-sizes[i]
[many more ...]

device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions

I tried running my python script using CUDA_LAUNCH_BLOCKING=1 python script.py but it didnt work either.

my transformers and pytorch versions are:

transformers==4.45.0.dev0
torch==2.4.0

Anyone knows how to fix?

John6666 · September 26, 2024, 8:51am

Apparently, it is rarely a literal pytorch or CUDA error, but rather an error message that appears when a function is passed an out-of-range value or an index error of some sort occurs.
That’s what error messages in general are all about…

RaushanTurganbay · September 28, 2024, 10:50am

Hmm, I tried to do Multi-GPU generation with Qwen using the provided script and didn’t get CUDA-side failures. There was some device mismatch, which I will fix soon.

Can it be related to Qwen2-VL: Multi-GPU training · Issue #33666 · huggingface/transformers · GitHub if you are also performing training?

Topic		Replies	Views
Qwen-VL Parallel GPU run not able to solve Models	1	1068	February 20, 2024
Problem with multiple GPUs Beginners	0	101	December 13, 2024
Unable to deploy Qwen2-VL model on SageMaker Amazon SageMaker	3	618	October 24, 2024
Extra GPU usage on custom Qwen2-VL 🤗Transformers	0	151	October 28, 2024
Inference on multi GPUs Research	2	225	May 1, 2025

How to use Qwen2-VL on multiple gpus?

Related topics