I am trying to run
bigcode/starcoder
model on Amazon EC2. I have 8 tesla T4 GPUs with 16GB RAM, but somehow I still encounter the “Cuda out of memory” error.
What possible solutions can this issue have? Thank you.
Code Snapshot:
from transformers import AutoModelForCausalLM, AutoTokenizer
from accelerate import Accelerator
accelerator = Accelerator()
checkpoint = “bigcode/starcoder”
device = accelerator.device # for GPU usage or “cpu” for CPU usage
print(‘Reached device Selection.’)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
print(‘Tokens generated.’)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
accelerator.prepare(model)
inputs = tokenizer.encode(“def print_hello_world():”, return_tensors=“pt”).to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
Error: