Generating multiple sequences with `Trainer.predict()`

Hi. I want to generate multiple sentences using Trainer.predict(), but cannot do so.

predict_results = trainer.predict(
    input_dataset["validation"],
    max_length=data_args.val_max_target_length,
    num_beams=generate_args.num_return_sequences * generate_args.beam_width,
    num_return_sequences=generate_args.num_return_sequences,
    num_beam_groups=generate_args.num_beam_groups,
    repetition_penalty=generate_args.repetition_penalty,
    diversity_penalty=generate_args.diversity_penalty,
    early_stopping=generate_args.early_stopping,
)

The above code produces as many sequences as there are entries in the dataset, but I wanted multiple per entry. Additionally, lets say I use a num_return_sequences of 4, the first 4 generations will be respective to sentence 1, the next 4 respective to sentence 2, up until we have len(dataset) sentences. How can I fix this issue? Is this a bug?

1 Like