I’m fine-tuning GPT-2 on a lot of data and ended up using all of my disk space on google drive. I’m not sure which args prevent too many checkpoints. Right now I’m getting a new checkpoint at every 500 example, but I’d like to avoid making so many checkpoints. At the very least only keep the best checkpoint.
Next fine-tuning run I’ll be using the following code (waiting to re-gain access to colab GPU), but I’m not sure if it’ll prevent the additional checkpoints:
!python gpt-2/run_clm.py \
--model_name_or_path gpt2 \
--train_file alignment_texts_87606.csv \
--do_train \
--fp16 \
--overwrite_cache \
–-overwrite_output_dir \
-–num_train_epochs 1 \
--per_device_train_batch_size=2 \
--output_dir gpt-2/tmp/alignment-texts-clm
I don’t think it will. I need something to limit the number of checkpoints. I don’t even know where the number 500 for each checkpoint comes from.
Thanks for the help!