I am using the code here for fine tuning pegasus large. I am doing this on an EC2 instance with 8 vCPU of 12 GB’s each. So total 96 GB’s across all 8 GPU’s. However I still get Cudo Out of Memory error even with a batch size of 1.
I would like to know if there’s any recommended hardware requirements for fine tuning Pegasus Large?
I have a dataset with around 1000 articles and their summaries.
Does the Trainer Class of Hugging Face make use of all GPU’s? Is there some setting or parameters that we need to set to make this happen?
I found this issue on the original github repo of Google. Not sure if this is applicable for Hugging Face libraries.
Hi, I also have a similar question: I duplicate a huggingface space on Nvidia 4xA10G large(96G vram) , but still got “cuda out of memory” error-message (from the message it seems I still only have about 24G vram): can I use the 96GB vram for one virtual-device? Thanks!