Train Large Dataset On The Cloud

Dloring1 · October 1, 2023, 3:22am

Can someone help me I am trying to fine tune a GPT2 on a RunPod instance. Is there a better way to get my datasets loaded on my RunPod so I am not wasting time and money on the cloud. I was hoping that maybe I could load, tokenize, generate the splits all at the house then save it afterwards to save a large amount of time.

Right now I do the following:

Load Model and tokenizer
Load Dataset and tokenize it
Train model
Save model

Topic		Replies	Views
Fine-tune, or train from scratch? Beginners	6	3454	September 16, 2020
How to train a gpt2 with colab pro Models	16	3710	February 29, 2024
How to train gpt-2 from scratch? (no fine-tuning) Beginners	17	19033	December 14, 2022
Anyone have idea how we can finetune a model using Trainer API? 🤗Transformers	0	446	April 22, 2022
GPT-2 fine-tuning Beginners	0	1610	June 12, 2023

Train Large Dataset On The Cloud

Related topics