Can someone help me I am trying to fine tune a GPT2 on a RunPod instance. Is there a better way to get my datasets loaded on my RunPod so I am not wasting time and money on the cloud. I was hoping that maybe I could load, tokenize, generate the splits all at the house then save it afterwards to save a large amount of time.
Right now I do the following:
- Load Model and tokenizer
- Load Dataset and tokenize it
- Train model
- Save model