RuntimeError: CUDA error: named symbol not found when using TorchAoConfig with Qwen2.5-VL-7B-Instruct model

sankalp-dhupar · July 22, 2025, 9:11pm

I’m trying to load the Qwen2.5-VL-7B-Instruct model from hugging face with 4-bit weight-only quantization using TorchAoConfig (similar to how its mentioned in the documentation Here), but I’m getting a runtime error related to CUDA.

Code:

from transformers import Qwen2_5_VLForConditionalGeneration, TorchAoConfig, AutoProcessor
import torch

torch.cuda.empty_cache()

quantization_config = TorchAoConfig("int4_weight_only", group_size=128)

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2.5-VL-7B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    quantization_config=quantization_config
)

processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")

I got the following error:

Error:

RuntimeError                              Traceback (most recent call last)
/tmp/ipython-input-9-2218636408.py in <cell line: 0>()
     13 
     14 quantization_config = TorchAoConfig("int4_weight_only", group_size=128)
---> 15 model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
     16     "Qwen/Qwen2.5-VL-7B-Instruct",
     17     torch_dtype=torch.bfloat16,

12 frames
/usr/local/lib/python3.11/dist-packages/torchao/quantization/utils.py in pack_tinygemm_scales_and_zeros(scales, zeros, dtype)
    356     guard_dtype_size(zeros, "zeros", dtype=dtype)
    357     return (
--> 358         torch.cat(
    359             [
    360                 scales.reshape(scales.size(0), scales.size(1), 1),

RuntimeError: CUDA error: named symbol not found
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I am new to this, probably missing something simple , Any help or insights would be appreciated!.

John6666 · July 22, 2025, 11:20pm

I’m not sure if this is the cause, but it seems to happen for sure when the GPU does not natively support bfloat16. In terms of GeForce, it is supported on RTX 30x0 series and later.

Pimpcat-AU · July 23, 2025, 12:40am

I’ve seen this before. This worked for me. I noticed you’re not using venv too. Make sure you use it every time.

python3 -m venv hfenv
source hfenv/bin/activate

pip install --upgrade pip
pip install --upgrade torch torchao --extra-index-url https://download.pytorch.org/whl/cu121

rm -rf ~/.cache/torch_extensions/

You can run a debug too if that doesn’t work.
CUDA_LAUNCH_BLOCKING=1 python3 your_script.py

sankalp-dhupar · July 23, 2025, 9:41pm

Umm i think i missed to mention, my bad, but i am using google colab.

John6666 · July 23, 2025, 11:37pm

If on Colab:

!pip install --upgrade torch torchao --extra-index-url https://download.pytorch.org/whl/cu121

sankalp-dhupar · July 24, 2025, 8:46pm

As a “workaround” simply using Bitsandbytesconfig worked for me. But let me also give a try what you suggested, because its simply kinda confusing as to why the approach mentioned in the documentation wont work. Thanks !!!

Topic		Replies	Views
How to load quantized LLM to CPU only device Intermediate	0	1937	January 28, 2024
Loading quantized model on CPU only 🤗Transformers	6	18486	February 3, 2025
An error i ve been trying to fix for days now Intermediate	4	438	November 19, 2024
Issue loading quantised model 🤗Transformers	0	276	February 28, 2024
Getting an error in AutoTrain 🤗AutoTrain	3	2882	January 22, 2025

RuntimeError: CUDA error: named symbol not found when using TorchAoConfig with Qwen2.5-VL-7B-Instruct model

Related topics