Hi,
I am using the trainer class to generate sentences/summaries for some textual input. Here is the piece of code:
if training_args.do_predict:
logger.info("*** Predict ***")
predict_results = trainer.predict(
predict_dataset, metric_key_prefix="predict", max_length=max_length, num_beams=num_beams
)
metrics = predict_results.metrics
max_predict_samples = (
data_args.max_predict_samples if data_args.max_predict_samples is not None else len(predict_dataset)
)
metrics["predict_samples"] = min(max_predict_samples, len(predict_dataset))
trainer.log_metrics("predict", metrics)
trainer.save_metrics("predict", metrics)
if trainer.is_world_process_zero():
if training_args.predict_with_generate:
predictions = tokenizer.batch_decode(
predict_results.predictions, skip_special_tokens=True, clean_up_tokenization_spaces=True
)
predictions = [pred.strip() for pred in predictions]
output_prediction_file = os.path.join(training_args.output_dir, "generated_predictions.txt")
with open(output_prediction_file, "w") as writer:
writer.write("\n".join(predictions))
When I look at the loss that is saved, this is what I see:
{‘predict_loss’: 9.717998504638672, ‘predict_rouge1’: 27.2727, ‘predict_rouge2’: 15.0, ‘predict_rougeL’: 27.2727, ‘predict_rougeLsum’: 27.2727, ‘predict_gen_len’: 18.0, ‘predict_runtime’: 0.7654, ‘predict_samples_per_second’: 2.613, ‘predict_steps_per_second’: 2.613, ‘predict_samples’: 2}
I don’t understand what this metric does since I am only generating summaries based on some textual input, and I don’t have any ground truth associated with each sample data. What is the Rouge calculated from? I was expecting a score for each sample (in my case 2) to be output. How can I get that score from the predicted results? I have looked through the predict function in the trainer class, and I don’t see anyway of extracting the info.
I know the model.generate has output_scores=True that I can use, but if I am using the above code, and do predict, why can’t I get the score?
Thanks.