CUDA Memory Error While Trying to Run Bloom Locally

Hey so I’m trying to get Bloom running locally I don’t do any AI coding or python so it’s tough And I ran into a roadblock

from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
import torch

torch.set_default_tensor_type(torch.cuda.FloatTensor)

print("downloading model")

model = AutoModelForCausalLM.from_pretrained("bigscience/bloom-1b3", use_cache=True)
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-1b3")


print("done")

set_seed(416136942)

model.__class__.__name__

prompt = 'Good morning...'

input_ids = tokenizer(prompt, return_tensors="pt").to(0)

sample = model.generate(**input_ids, max_length=50, top_k=0, temperature=0.9)

print(tokenizer.decode(sample[0], truncate_before_pattern=[r"\n\n^#", "^'''", "\n\n\n"]))

I get this error: RuntimeError: CUDA out of memory. Tried to allocate 1.91 GiB (GPU 0; 8.00 GiB total capacity; 6.42 GiB already allocated; 194.69 MiB free; 6.42 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF when Googling the error I see a lot of people saying to “decrease the batch size”. idek where hugging downloads and stores Bloom, so I can’t find where to edit this batch size param. Or where in the code I’d even find it if I could find the code. Or if it will even solve the error. I’m running a copy/paste of this code, which works in the collab so it’s an issue with my gpu: Google Colab