How to parallel infer multiple input sentences with beam search = 4?

I was wondering if it can infer multiple input sentences with beam search = 4 right now.

for example, generate the output parallel:
`import torch
from transformers import BartForConditionalGeneration, BartTokenizer
from tqdm import tqdm

device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)

model_name = ‘facebook/bart-large-cnn’
tokenizer = BartTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name).to(device)

model.eval()

input_sentences = [
“The sun rises in the east and sets in the west.”,
“Artificial intelligence is transforming the way we live.”,
“The ocean is vast and full of mysteries.”
]

for input_text in tqdm(input_sentences):
# Tokenize the input text
inputs = tokenizer(input_text, return_tensors=‘pt’, max_length=1024, truncation=True).to(device)

# Generate sentences
with torch.no_grad():
    outputs = model.generate(
        inputs['input_ids'],
        max_length=100,  # Adjust based on your desired sentence length
        num_return_sequences=5,  # Generate 1 sentence for each input
        num_beams=5,  # Beam search for better quality
        early_stopping=True
    )`

Batch generation with beam search >1 would save time.

It seems like it was unable to do that before, but I am not sure whether there is a better way to do that right now.

1 Like