I’m fine-tuning GPT-2 XL for 3 epochs and I am wondering how I can get the data seen by the model every 242 steps. I thought of extracting the number of rows from the original training dataset (input_ids
) by multiplying batch size
with the number of steps, but I’m guessing the order of the data might be shuffled during training so it might not be the right thing to do. I’d appreciate any help or directions.
These are my training args:
training_args = TrainingArguments( f"models/XL/", evaluation_strategy = "steps", learning_rate=2e-5, weight_decay=0.01, push_to_hub=False, num_train_epochs=3, per_device_train_batch_size=8, per_device_eval_batch_size=8, save_strategy="steps", save_steps = 242, fp16=True, report_to="none", logging_strategy="steps", logging_steps=50, )