Hello, when I finetune my RAG-based model on my 4xV100 box i’m having some issues with OOM on my GPUs. I can only use a batch size of 1 for both train and eval and still fit the examples into the GPU memory. The GPUs have about 16GB of memory each and a batch size of 1 uses between 11-15GB of memory depending on the other params i’m using. This could just be the nature of the model, but I want to make sure that I’m not doing something wrong that is blowing up the memory. My knowledge dataset is much smaller than the default indexes. I am using the finetuning script in examples/research-projects/rag/finetune_rag.sh. Thank you for your help.
Related Topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Loading extra memory in GPU 0 using DDP | 0 | 363 | June 18, 2023 | |
How to specify different batch sizes for different GPUs when training with rum_mlm.py? | 1 | 1080 | July 26, 2021 | |
CUDA OOM. Is it possible to distribute the usage of memory across 2gpu evenly? | 1 | 311 | August 9, 2023 | |
Regarding the eval batch size for large models | 0 | 1011 | May 9, 2022 | |
Using Batch Encodings | 0 | 447 | July 12, 2022 |