I’m fine-tuning pegasus on my own data, which is about 15,000 examples.
I am finding, when fine-tuning Pegasus, using pegasus-large
, that the RAM requirements for even just a batch size
of 1
are so extreme, that a Nvidia card with 16GB
of memory is required… just to run the batch size
of 1
! So at this point I am thinking that maybe my training will run better on the CPU, using a machine with a huge amount of ram… like 512GB
of ram… as this seems to allow a much bigger batch size, like up to 64
or 128
.
My guess is that the RAM requirements are so extreme because I am using pegasus-large
. I’m doing this based on my understanding of this page:
: Pegasus
All the checkpoints are fine-tuned for summarization, besides pegasus-large, whence the other checkpoints are fine-tuned
My understanding from this is that, if we, as the newbie user, have some data we want to use with Pegasus, we should do this:
- Start with pegasus-large: google/pegasus-large · Hugging Face
- Fine tune it on our own data
- Use the
pytorch_model.bin
output from this fine tuning process to run inference on our own data.
Am I getting something wrong here? Given that I have 15,000 examples, have I made the correct determination that I should fine-tune pegasus-large
, and that this will lead to the best results, even though the memory requirements are huge?
I looked for distilled
model, here: Models - Hugging Face
… But my understanding (possibly wrong?) is that these distilled
models are ALREADY fine-tuned, so they would not be appropriate to use, given that I have a lot of my OWN data to fine-tune with.
Thanks!