New seq2seq tool: search hparam space with run_eval.py

FYI, there is a new tool available to you - you can now search the hparam space with run_eval.py.

It’s called run_eval_search.py

It uses the same arguments as run_eval.py, but allows you to parametrize the hparams, so in addition to the normal args you can pass:

--search="num_beams=8:11:15 length_penalty=0.9:1.0:1.1 early_stopping=true:false"

and it’ll search all the possible combinations and at the end print a table of results sorted by the scores of the task. e.g.:


bleu  | num_beams | length_penalty | early_stopping
----- | --------- | -------------- | --------------
41.35 |        11 |            1.1 |              0
41.33 |        11 |            1.0 |              0
41.33 |        11 |            1.1 |              1
41.32 |        15 |            1.1 |              0
41.29 |        15 |            1.1 |              1
41.28 |        15 |            1.0 |              0
41.25 |         8 |            1.1 |              0
41.24 |        11 |            1.0 |              1
41.23 |        11 |            0.9 |              0
41.20 |        15 |            1.0 |              1
41.18 |         8 |            1.0 |              0

You can have one or more params searched.

Here is an example of a full command:

PYTHONPATH="src:examples/seq2seq" python examples/seq2seq/run_eval_search.py \
facebook/wmt19-$PAIR $DATA_DIR/val.source $SAVE_DIR/test_translations.txt \
--reference_path $DATA_DIR/val.target --score_path $SAVE_DIR/test_bleu.json \
--bs $BS --task translation \
--search="num_beams=1:5 length_penalty=0.9:1.1 early_stopping=true:false"

If you encounter any issues please let me know.

It’s documented here: https://github.com/huggingface/transformers/blob/master/examples/seq2seq/README.md#run_eval-tips-and-tricks. @sshleifer and I added some more goodies in run_eval.py - you will find them all documented at that url.

Enjoy.

p.s. edited to remove things that are going to change based on Sam’s comment below.

4 Likes

Great work!

There are only two possible sets of keys to get from run_eval.py since
score_fn = calculate_bleu_score if "translation" in args.task else calculate_rouge

You shouldn’t hard code the possible tasks any more than that IMO.

1 Like

ah, thank you for clarifying that - I will adjust it to follow the same logic.

This is awesome ! Thanks @stas

1 Like

I haven’t checked the code, I’m on mobile now. But are there many scenarios where we actually need to do hyperparameters search on the evaluation/inference side? In addition, does this use the optuna implementation that is being worked on in the trainer by @sgugger , or is it a separate implementation?

When you train a seq2seq model on new summ or translation dataset or other seq2seq task and want to decide how many beams to use, should use length penalty or not, what should be the max seq length, what should be the no_repeat_ngram_size etc, all of these parameter affect the metrics , so this tool helps to make those decisions,

It does not use optuna, it just uses itetools.product to enumerate the different combinations and evaluate on them

2 Likes