RuntimeError: CUDA out of memory even with simple inference

Hi everyone,

I am trying to use the pre-trained DistiBert model to perform sentiment analysis on some stock data data. When trying to feed the input sentences to the model though I get the following error:

““RuntimeError: CUDA out of memory. Tried to allocate 968.00 MiB (GPU 0; 11.17 GiB total capacity; 8.86 GiB already allocated; 869.81 MiB free; 9.73 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF””

I have read on other answers that a possible solution would be to lower batch size during training but this happens to me even when running torch.no_grad(). Also the max length of a sentence is 61 so I don’t think the issue lies there. Any idea where the problem could lie? I am new to PyTorch and deep learning in general so forgive me for the lame question.

Here is the code :

model_class = tsf.DistilBertModel
model = model_class.from_pretrained('distilbert-base-uncased')
tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased')
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

df = pd.read_csv("stock_data.csv",delimiter =";")
df = df.dropna()
df['Text'] = df['Text'].astype(str)
text_list = df['Text'].values.tolist()

tokenized = df['Text'].apply((lambda x: tokenizer.encode(x, add_special_tokens=True)))
max_len = 0
for i in tokenized.values:
    if len(i) > max_len:
        max_len = len(i)

padded = np.array([i + [0]*(max_len-len(i)) for i in tokenized.values])
attention_mask = np.where(padded != 0, 1, 0)

input_ids = torch.tensor(padded,device='cuda')  
attention_mask = torch.tensor(attention_mask,device='cuda')

with torch.no_grad():
    last_hidden_states = model(input_ids, attention_mask=attention_mask)

Thank you in advance

Hi @totoro02,

I faced the same error fine tuning another model, but in my case I needed to lower the batch size from 64 to 16.

I did not applied the torch.no_grad().

In your case, what was lowest batch size you tried?