RuntimeError: CUDA out of memory even with simple inference

totoro02 · November 17, 2021, 5:58pm

Hi everyone,

I am trying to use the pre-trained DistiBert model to perform sentiment analysis on some stock data data. When trying to feed the input sentences to the model though I get the following error:

““RuntimeError: CUDA out of memory. Tried to allocate 968.00 MiB (GPU 0; 11.17 GiB total capacity; 8.86 GiB already allocated; 869.81 MiB free; 9.73 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF””

I have read on other answers that a possible solution would be to lower batch size during training but this happens to me even when running torch.no_grad(). Also the max length of a sentence is 61 so I don’t think the issue lies there. Any idea where the problem could lie? I am new to PyTorch and deep learning in general so forgive me for the lame question.

Here is the code :

model_class = tsf.DistilBertModel
model = model_class.from_pretrained('distilbert-base-uncased')
tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased')
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
model.to(device)

df = pd.read_csv("stock_data.csv",delimiter =";")
df = df.dropna()
df['Text'] = df['Text'].astype(str)
text_list = df['Text'].values.tolist()

tokenized = df['Text'].apply((lambda x: tokenizer.encode(x, add_special_tokens=True)))
max_len = 0
for i in tokenized.values:
    if len(i) > max_len:
        max_len = len(i)

padded = np.array([i + [0]*(max_len-len(i)) for i in tokenized.values])
attention_mask = np.where(padded != 0, 1, 0)

input_ids = torch.tensor(padded,device='cuda')  
attention_mask = torch.tensor(attention_mask,device='cuda')

with torch.no_grad():
    last_hidden_states = model(input_ids, attention_mask=attention_mask)

Thank you in advance

milyiyo · January 16, 2022, 6:01am

Hi @totoro02,

I faced the same error fine tuning another model, but in my case I needed to lower the batch size from 64 to 16.

I did not applied the torch.no_grad().

In your case, what was lowest batch size you tried?

Topic		Replies	Views
RuntimeError: CUDA out of memory. Tried to allocate 1.91 GiB (GPU 0; 15.78 GiB total capacity; 12.36 GiB already allocated; 302.75 MiB free; 14.16 GiB reserved in total by PyTorch) Beginners	2	1379	September 11, 2021
CUDA out of memory for Longformer Beginners	6	1281	October 22, 2021
RuntimeError: CUDA out of memory. Tried to allocate 11.53 GiB (GPU 0; 15.90 GiB total capacity; 4.81 GiB already allocated; 8.36 GiB free; 6.67 GiB reserved in total by PyTorch) Beginners	4	3092	April 20, 2021
RuntimeError: CUDA out of memory. Tried to allocate 384.00 MiB (GPU 0; 11.17 GiB total capacity; 10.62 GiB already allocated; 145.81 MiB free; 10.66 GiB reserved in total by PyTorch) Beginners	8	27532	December 10, 2023
Solving "CUDA out of memory" when fine-tuning GPT-2 🤗Transformers	0	1423	January 6, 2022

RuntimeError: CUDA out of memory even with simple inference

Related topics