Hello community!
I’m currently trying to fine-tune the HuggingFaceH4/zephyr-7b-beta
by splitting the model to multiple GPUs.
I have 4 RTX 4090 available, but I get an error during inference/training when I load it with device_map=“auto”.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "HuggingFaceH4/zephyr-7b-beta"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map=device_map)
x = tokenizer("Hello World", return_tensors="pt").to("cuda")
with torch.no_grad():
model(**x)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
But strangely enough, I can move single layers (one per GPU) to a different GPU without getting this error. As soon as I move a second one, I get the error.
Works fine:
device_map = {
'model.embed_tokens': 0,
'model.layers.0': 0,
'model.layers.1': 0,
'model.layers.2': 0,
'model.layers.3': 1, # 1
'model.layers.4': 3, # 3
'model.layers.5': 2, # 2
'model.layers.6': 0,
...
'model.layers.31': 0,
'model.norm': 1, # 1
'lm_head': 1 # 1
}
Not working:
device_map = {
'model.embed_tokens': 0,
'model.layers.0': 0,
'model.layers.1': 0,
'model.layers.2': 0,
'model.layers.3': 3, # 3 <------------------ changed to 3
'model.layers.4': 3, # 3
'model.layers.5': 2, # 2
'model.layers.6': 0,
...
'model.layers.31': 0,
'model.norm': 1, # 1
'lm_head': 1 # 1
}
I would be very grateful for any help or ideas.
Kind regards,
Christopher