I would like to see a few examples to manually verify the input and output. Previously in an older version of transformers there is a functionality of doing that. I am wondering whether there is something similar for trianer?
This is not something related to the Trainer
, you just have to print some elements of your dataset. The functionality is implemented in all examples, see for instance this one
Thanks!
A follow-up question is how I could print out a few examples with output say per-batch? A concrete example would be given a seq2seqtrainer, I want to check the raw output during training as a measurement of progress. Would that be something doable?
You can get the first batch of the training dataloader by doing:
for batch in trainer.get_train_dataloader():
break
and then access the sentences in batch. You will need to decode the inputs using your tokenizer though.
I see. But this only has the input, right? How could I access a few random examples of the output?
output = model(**batch)
will give you the output.