I use an MBart-based custom model and use
beam_search for model generation.
I observed that depending on model training (more speficially depending on learning rate used to train the model), the model does not meet stopping criteria for
beam_scorer.is_done is always False so the code runs till it exhausts gpu memory.
Here’s the code snippet and parameters that I use for generation.
eval_generated = self.model.generate(input_ids=dev_input["input_ids"], attention_mask=dev_input["attention_mask"], decoder_start_token_id=decoder_start_token_id, forced_bos_token_id=forced_bos_token_id, bad_words_ids=bad_words_ids, num_beams=5, max_new_tokens=512, early_stopping=True,)
The weird thing is that this does not have consistent behavior. Depending on the learning rate I use, the model successfully generates outputs without cuda oom error. I have gpu A100 with 80GB+ memory so this shouldn’t be an issue…
Does anybody know how to fix the problem?
Thank you very much,