RuntimeError: CUDA out of memory. Tried to allocate 11.53 GiB (GPU 0; 15.90 GiB total capacity; 4.81 GiB already allocated; 8.36 GiB free; 6.67 GiB reserved in total by PyTorch)

jolurf · April 19, 2021, 6:43pm

After I run trainer.train and I try to predict the wer of the model, I always get this output.
How to solve it?

The code is below:
def predict(batch, model):

input_dict = processor(batch["input_values"], sampling_rate=16000, return_tensors='pt',padding=True)

logits = model(input_dict.input_values.to(device)).logits

pred_ids = torch.argmax(logits, dim=-1)[0]

batch['pred_ids'] = processor.decode(pred_ids)

return batch

from transformers import TrainingArguments

from transformers import Wav2Vec2ForCTC

model = Wav2Vec2ForCTC.from_pretrained(

model_name, 

attention_dropout=0.1,

hidden_dropout=0.1,

feat_proj_dropout=0.0,

mask_time_prob=0.05,

layerdrop=0.1,

gradient_checkpointing=True, 

ctc_loss_reduction="mean", 

pad_token_id=processor.tokenizer.pad_token_id,

vocab_size=len(processor.tokenizer)

)

training_args = TrainingArguments(

output_dir=“/content/gdrive/MyDrive/wav2vec2-large-xlsr-portuguese-demo/modelo”,

output_dir=“./wav2vec2-large-xlsr-portuguese-demo”,

group_by_length=True,

per_device_train_batch_size=16,

gradient_accumulation_steps=2,

evaluation_strategy=“steps”,

num_train_epochs=5,

fp16=True,

save_steps=400,

eval_steps=400,

logging_steps=400,

learning_rate=3e-4,

warmup_steps=500,

save_total_limit=2,

)

from transformers import Trainer

trainer = Trainer(

model=model,

data_collator=data_collator,

args=training_args,

compute_metrics=compute_metrics,

train_dataset=d_train,

eval_dataset=d_val,

tokenizer=processor.feature_extractor,

)

If you want to acceed to the whole project, it is avaiable at:

zanderbush · April 19, 2021, 7:35pm

Try this:

import gc

gc.collect()

torch.cuda.empty_cache()

jolurf · April 19, 2021, 8:28pm

It isn’t working :’( and the data is not that large

unknownTransformer · April 20, 2021, 5:47am

i had the same problem and restarting my notebook-kernel helped
another time i got that problem i had another notebook-project open, closing it & restarting my “main” notebook helped there

IamAdiSri · April 20, 2021, 6:54am

You’re running out of memory for whatever reason. You can try making your batch size smaller, and use gradient accumulation. You can also try using mixed precision training. You can find out what these terms mean in the documentation for Trainer class here.

Topic		Replies	Views
RuntimeError: CUDA out of memory. Tried to allocate 1.91 GiB (GPU 0; 15.78 GiB total capacity; 12.36 GiB already allocated; 302.75 MiB free; 14.16 GiB reserved in total by PyTorch) Beginners	2	1326	September 11, 2021
Always getting RuntimeError: CUDA out of memory with Trainer 🤗Transformers	10	6918	April 4, 2024
Constantly running out of memory fine-tuning Wav2Vec2 DeepSpeed	1	975	April 28, 2022
RuntimeError: CUDA out of memory. Tried to allocate 384.00 MiB (GPU 0; 11.17 GiB total capacity; 10.62 GiB already allocated; 145.81 MiB free; 10.66 GiB reserved in total by PyTorch) Beginners	8	27456	December 10, 2023
Out of memory error Beginners	0	836	January 26, 2023

RuntimeError: CUDA out of memory. Tried to allocate 11.53 GiB (GPU 0; 15.90 GiB total capacity; 4.81 GiB already allocated; 8.36 GiB free; 6.67 GiB reserved in total by PyTorch)

output_dir=“./wav2vec2-large-xlsr-portuguese-demo”,

Related topics