Hello,
I use an MBart-based custom model and use beam_search
for model generation.
I observed that depending on model training (more speficially depending on learning rate used to train the model), the model does not meet stopping criteria for beam_search
or beam_scorer.is_done
is always False so the code runs till it exhausts gpu memory.
Here’s the code snippet and parameters that I use for generation.
eval_generated = self.model.generate(input_ids=dev_input["input_ids"],
attention_mask=dev_input["attention_mask"],
decoder_start_token_id=decoder_start_token_id,
forced_bos_token_id=forced_bos_token_id,
bad_words_ids=bad_words_ids,
num_beams=5,
max_new_tokens=512,
early_stopping=True,)
The weird thing is that this does not have consistent behavior. Depending on the learning rate I use, the model successfully generates outputs without cuda oom error. I have gpu A100 with 80GB+ memory so this shouldn’t be an issue…
Does anybody know how to fix the problem?
Thank you very much,