Hey guys
I currently got an insufficient GPU memory error with the config below. Training on 8 x V100 GPUs.
It doesn’t appear imidiately though, but rather non-deterministicly far into the training, which rather points to a memory leak somewhere. Would you have some tips or ideas how to approach this?
Any ideas?
training_args = TrainingArguments(
output_dir="./wav2vec2-xlsr-sg-g",
logging_dir=’./logs’,
group_by_length=True,
per_device_train_batch_size=16,
gradient_accumulation_steps=2,
evaluation_strategy=“steps”,
num_train_epochs=30,
fp16=False,
save_steps=400,
eval_steps=400,
logging_steps=400,
learning_rate=3e-4,
warmup_steps=500,
save_total_limit=2,
)