CUDA error: device-side assert triggered on device_map="auto"

Hello community!

I’m currently trying to fine-tune the HuggingFaceH4/zephyr-7b-beta by splitting the model to multiple GPUs.
I have 4 RTX 4090 available, but I get an error during inference/training when I load it with device_map=“auto”.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "HuggingFaceH4/zephyr-7b-beta"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map=device_map)

x = tokenizer("Hello World", return_tensors="pt").to("cuda")
with torch.no_grad():
    model(**x)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

But strangely enough, I can move single layers (one per GPU) to a different GPU without getting this error. As soon as I move a second one, I get the error.

Works fine:

device_map = {
    'model.embed_tokens': 0,
    'model.layers.0': 0,
    'model.layers.1': 0,
    'model.layers.2': 0,
    'model.layers.3': 1, # 1
    'model.layers.4': 3, # 3
    'model.layers.5': 2, # 2
    'model.layers.6': 0,
    ...
    'model.layers.31': 0,
    'model.norm': 1, # 1
    'lm_head': 1 # 1
}

Not working:

device_map = {
    'model.embed_tokens': 0,
    'model.layers.0': 0,
    'model.layers.1': 0,
    'model.layers.2': 0,
    'model.layers.3': 3, # 3 <------------------ changed to 3
    'model.layers.4': 3, # 3
    'model.layers.5': 2, # 2
    'model.layers.6': 0,
    ...
    'model.layers.31': 0,
    'model.norm': 1, # 1
    'lm_head': 1 # 1
}

I would be very grateful for any help or ideas. :slight_smile:

Kind regards,
Christopher

I finally had the opportunity to test the code on another cluster (4xA100), the same error occurs there.

@CKeibel I am also getting the same error and that for summarization pipeline

Unfortunately, I don’t have any update yet either :frowning_face:
Haven’t gotten it to run yet