Starcoder: CUDA out of memory

shrey-shah · November 27, 2023, 10:24pm

I am trying to run

bigcode/starcoder

model on Amazon EC2. I have 8 tesla T4 GPUs with 16GB RAM, but somehow I still encounter the “Cuda out of memory” error.

What possible solutions can this issue have? Thank you.

Code Snapshot:

from transformers import AutoModelForCausalLM, AutoTokenizer
from accelerate import Accelerator
accelerator = Accelerator()
checkpoint = “bigcode/starcoder”
device = accelerator.device # for GPU usage or “cpu” for CPU usage
print(‘Reached device Selection.’)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
print(‘Tokens generated.’)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
accelerator.prepare(model)
inputs = tokenizer.encode(“def print_hello_world():”, return_tensors=“pt”).to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Error:

lkthomas · February 9, 2024, 6:02am

same issue here

CKeibel · February 9, 2024, 7:41am

Hi!
I searched the internet and it seems that accelerator.device always uses cuda:0, so the model is not distributed to the GPUs.

Maybe this helps you.

lkthomas · February 22, 2024, 9:28am

is it a limit for most of the LLM model which can’t scale into multiple GPUs?

Zahcoder34 · February 23, 2024, 4:44pm

AutoModelForCausalLM.from_pretrained has device_map option, set it to device_map="auto" it will split model layers into different available devices.

Correct:
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto").to(device)

Topic		Replies	Views
OutOfMemoryError: CUDA out of memory(LLM) Tuning Models	1	69	April 22, 2025
GPTBigCode gives garbled output on Nvidia A10G 🤗Accelerate	1	48	August 5, 2024
Getting error when running inference in multiple GPUs 🤗Transformers	0	650	October 13, 2023
OPT Memory problem Beginners	2	813	June 2, 2022
Cuda Out of Memory with Multi-GPU Accelerate for gemma-2b 🤗Accelerate	1	135	December 22, 2024

Starcoder: CUDA out of memory

Related topics