Is it possible to get the data that is seen by the model during training?

I’m fine-tuning GPT-2 XL for 3 epochs and I am wondering how I can get the data seen by the model every 242 steps. I thought of extracting the number of rows from the original training dataset (input_ids) by multiplying batch size with the number of steps, but I’m guessing the order of the data might be shuffled during training so it might not be the right thing to do. I’d appreciate any help or directions.

These are my training args:
training_args = TrainingArguments( f"models/XL/", evaluation_strategy = "steps", learning_rate=2e-5, weight_decay=0.01, push_to_hub=False, num_train_epochs=3, per_device_train_batch_size=8, per_device_eval_batch_size=8, save_strategy="steps", save_steps = 242, fp16=True, report_to="none", logging_strategy="steps", logging_steps=50, )

@sgugger could you help here? :slight_smile: