CUDA error: device-side assert triggered on device_map="auto"

CKeibel · February 1, 2024, 3:07pm

Hello community!

I’m currently trying to fine-tune the HuggingFaceH4/zephyr-7b-beta by splitting the model to multiple GPUs.
I have 4 RTX 4090 available, but I get an error during inference/training when I load it with device_map=“auto”.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "HuggingFaceH4/zephyr-7b-beta"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map=device_map)

x = tokenizer("Hello World", return_tensors="pt").to("cuda")
with torch.no_grad():
    model(**x)

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

But strangely enough, I can move single layers (one per GPU) to a different GPU without getting this error. As soon as I move a second one, I get the error.

Works fine:

device_map = {
    'model.embed_tokens': 0,
    'model.layers.0': 0,
    'model.layers.1': 0,
    'model.layers.2': 0,
    'model.layers.3': 1, # 1
    'model.layers.4': 3, # 3
    'model.layers.5': 2, # 2
    'model.layers.6': 0,
    ...
    'model.layers.31': 0,
    'model.norm': 1, # 1
    'lm_head': 1 # 1
}

Not working:

device_map = {
    'model.embed_tokens': 0,
    'model.layers.0': 0,
    'model.layers.1': 0,
    'model.layers.2': 0,
    'model.layers.3': 3, # 3 <------------------ changed to 3
    'model.layers.4': 3, # 3
    'model.layers.5': 2, # 2
    'model.layers.6': 0,
    ...
    'model.layers.31': 0,
    'model.norm': 1, # 1
    'lm_head': 1 # 1
}

I would be very grateful for any help or ideas.

Kind regards,
Christopher

CKeibel · February 7, 2024, 7:39am

I finally had the opportunity to test the code on another cluster (4xA100), the same error occurs there.

Sahibsingh12 · February 11, 2024, 6:33pm

@CKeibel I am also getting the same error and that for summarization pipeline

CKeibel · February 12, 2024, 10:19am

Unfortunately, I don’t have any update yet either
Haven’t gotten it to run yet

shadowshadow · December 8, 2024, 8:57am

Hi, I also come up with this problem when setting device_map="auto", changing it with device_map="cuda" will fix this error but will only use 1 gpu for inference. Have you fixed it now?

Topic		Replies	Views
Device_map="auto" with error: Expected all tensors to be on the same device Beginners	7	6765	January 5, 2025
Tokenizer setting for model = LlamaForCausalLM.from_pretrained(model_path, device_map='auto') Models	0	1128	August 25, 2023
Infer_auto_device_map returns empty 🤗Accelerate	2	3268	March 15, 2023
Runtime error when using device_map 🤗Transformers	1	1179	September 20, 2023
Running inference on flan-ul2 on multi-gpu 🤗Accelerate	8	4433	June 6, 2023

CUDA error: device-side assert triggered on device_map="auto"

Related topics