Trainer Api Training Arguments (info source where to look)

Hello I am working with Trainer api and I wanted to know if you have a source where the Training arguments are better explained. I do not possess the vocabulary to understand what each of them mean. While training, I am facing a CUDA OOM error and I couldnt pinpoint which ones control the GPU memory usage.

Hi,

Would recommend this guide for efficiently training on your GPU: Methods and tools for efficient training on a single GPU.

I’d recommend starting with a small batch size, use gradient accumulation/checkpointing, use a non-memory intensive optimizer like Adam-8bit, etc.

Also if you’re training an LLM it might help to use LoRa instead of full fine-tuning

I should point out the particulars I am using the pegasus-dailymail-cnn model. I have a small dataset of 14372, 819, 819 of train test val split. And currently my parameters are set to following and it is still causing me OOM errors. I will check out the given link. But I would really welcome any quick idea you can give me based on my current parameters
params