Seq2seq evaluation speed is slow

marton-avrios · July 14, 2020, 5:12pm

While running the seq2seq examples following the Readme I found that training is relatively fast and uses >50% of the GPU while evaluations (with the exact same batch size) is painfully slow with low GPU utilization. True for both T5 and BART. What am I doing wrong?

chrisdoyleIE · July 15, 2020, 11:57am

As a guess, evaluation requires a generation procedure whereas training uses teacher-forcing and cross-entropy loss. This might not be the case though because the validation step may include some generation - I would highly recommend using the Pycharm debugger to test out what part of the code is taking so long.

valhalla · July 15, 2020, 12:06pm

Yes, this is correct, eval is slow due to generation

marton-avrios · July 15, 2020, 3:39pm

Is it possible to turn off generation but still get validation loss and save the best checkpoint based on that? There is no need for generation if I do not want ROUGE or BLEU scores during training, right?

chrisdoyleIE · July 16, 2020, 11:04am

If you install from source, you can edit finetune.py to do as you describe

The function you want is here, and you could introduce a switch to only generate if called from test_step.

Validation loss is also calculated within this function and is the basis upon which checkpoints are ranked (as far as I understand)

marton-avrios · July 16, 2020, 11:12am

I think checkpoints are selected based on val_metric which can be anything I write code for (currently ROUGE and BLEU are implemented, I already added Levenshtein distance but that also requires generation)

jenniferL · February 26, 2022, 10:38pm

Hi @valhalla @chrisdoyleIE,

Thank you very much for answering the question.

Could you please explain a little more about what ‘generation’ mean under this context? Personally, I feel like training still involves generation. In a forward pass, after encoder passes context vector to decoder, the decoder will still have to ‘generate’ a sequence. And during the generation of a sequence, for each step, the (cross entropy) loss function can calculate the difference between the softmax-ed distribution of all tokens with the ground truth distribution. If it still requires the whole predicted sequence to be ‘generated’ in order to calculate loss, why would the answer be that ‘generation’ is solely performed for validation and that is the reason for it to be time consuming? I appreciate any feedback on this. Thanks in advance!

Hi @chrisdoyleIE,

Could you please elaborate more on how teacher forcing can speed up the training process? Thank you!

anujsahani01 · June 20, 2023, 11:51pm

Using Evaluators is a better and faster option, there are different evaluators for different tasks.

from evaluate import TranslationEvaluator
metric = TranslationEvaluator(task = ‘translation’, default_metric_name = “bleu”)
evaluation_results = metric.compute(model_or_pipeline = model, tokenizer = tokenizer, metric = ‘blue’, data = dataset[‘test’] , device = 0, split = ‘test’ ,input_column = ‘inputs’, label_column = ‘targets’)

I used this for my translation model.
One thing to note: The dataset we pass here should be normal text data, not tokenized data.

refer this article

Topic		Replies	Views
Evaluation step take longer then training step Intermediate	0	828	October 23, 2023
Trainer.evaluate() with text generation Beginners	5	3527	December 31, 2021
Evaluate model at saved checkpoint 🤗Transformers	0	1295	June 22, 2021
Can I compute `eval_loss` and `bleu` score simultaneously for decoder only transformers 🤗Transformers	0	436	June 23, 2023
Seq2SeqTrainingAguments Beginners	0	256	January 26, 2023

Seq2seq evaluation speed is slow

Related topics