i am using the following code, to generate predictions from my trained model:
def generate_new_items(model, tokenizer, start_tensor, use_longer_words, num_return=100): if use_longer_words: tokens_per_item = 3 else: tokens_per_item = 1 beam_outputs = model.generate( start_tensor, max_new_tokens=tokens_per_item, num_beams=num_return, num_return_sequences=num_return, early_stopping=True, pad_token_id=50256 ) new_item_tokens = beam_outputs[:, -tokens_per_item:] new_items = tokenizer.batch_decode(new_item_tokens, skip_special_tokens=True) if use_longer_words: new_items = [x.split(", ").strip(", \n.") for x in new_items] new_items = [x.split(" ").strip(", \n.") for x in new_items] new_items = [int(x) for x in new_items if not x == ""] return new_items
However, I find that the beam search used for generation of new items runs extremely slow.
I already found a similar problem here. As a solution they recommend to use DeepSpeed. Unfortunately, I do not understand how to do this.
Is there any straightforward method to speed up the beam search?
I would be grateful for any help. Thanks already!