I’m fine-tuning pegasus on my own data, which is about 15,000 examples.
I am finding, when fine-tuning Pegasus, using
pegasus-large , that the RAM requirements for even just a
batch size of
1 are so extreme, that a Nvidia card with
16GB of memory is required… just to run the
batch size of
1 ! So at this point I am thinking that maybe my training will run better on the CPU, using a machine with a huge amount of ram… like
512GB of ram… as this seems to allow a much bigger batch size, like up to
My guess is that the RAM requirements are so extreme because I am using
pegasus-large. I’m doing this based on my understanding of this page:
All the checkpoints are fine-tuned for summarization, besides pegasus-large, whence the other checkpoints are fine-tuned
My understanding from this is that, if we, as the newbie user, have some data we want to use with Pegasus, we should do this:
- Start with pegasus-large: https://huggingface.co/google/pegasus-large
- Fine tune it on our own data
- Use the
pytorch_model.binoutput from this fine tuning process to run inference on our own data.
Am I getting something wrong here? Given that I have 15,000 examples, have I made the correct determination that I should fine-tune
pegasus-large, and that this will lead to the best results, even though the memory requirements are huge?
I looked for
distilled model, here: https://huggingface.co/models?search=pegasus
… But my understanding (possibly wrong?) is that these
distilled models are ALREADY fine-tuned, so they would not be appropriate to use, given that I have a lot of my OWN data to fine-tune with.