Hello,
I am trying out generation in the following manner –
with torch.no_grad():
generated_tokens = model.generate(
desc_tokens.to('cuda'),
max_new_tokens=256,
min_new_tokens=256,
force_words_ids = constraint_tokens,
num_beams=2
)
This is unexpectedly taking a lot of time. Is there a known issue?