Hi together,
i am using the following code, to generate predictions from my trained model:
def generate_new_items(model, tokenizer, start_tensor, use_longer_words, num_return=100):
if use_longer_words:
tokens_per_item = 3
else:
tokens_per_item = 1
beam_outputs = model.generate(
start_tensor,
max_new_tokens=tokens_per_item,
num_beams=num_return,
num_return_sequences=num_return,
early_stopping=True,
pad_token_id=50256
)
new_item_tokens = beam_outputs[:, -tokens_per_item:]
new_items = tokenizer.batch_decode(new_item_tokens, skip_special_tokens=True)
if use_longer_words:
new_items = [x.split(", ")[0].strip(", \n.") for x in new_items]
new_items = [x.split(" ")[0].strip(", \n.") for x in new_items]
new_items = [int(x) for x in new_items if not x == ""]
return new_items
However, I find that the beam search used for generation of new items runs extremely slow.
I already found a similar problem here. As a solution they recommend to use DeepSpeed. Unfortunately, I do not understand how to do this.
Is there any straightforward method to speed up the beam search?
I would be grateful for any help. Thanks already!