Hello Everyone!
I am new to using google/pegasus model and NLP, I was able to run the following code for individual text but I have a data frame which has a full column with speeches and I am trying to run the following model on that column and save the summary as another column generated by google, something like this:
df[‘summary’] = df[‘final’].apply(lambda x: lex_summarizer())
Google Pegasus code:
model_name = ‘google/pegasus-xsum’
torch_device = ‘cuda’ if torch.cuda.is_available() else ‘cpu’
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model =PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)
text = df[‘final’]
batch = tokenizer.prepare_seq2seq_batch(text, truncation=True, padding=‘longest’,return_tensors=‘pt’)
translated = model.generate(**batch)
pegasus_text = tokenizer.batch_decode(translated, skip_special_tokens=True)
Error: ValueError: text input must of type str
(single example), List[str]
(batch or single pretokenized example) or List[List[str]]
(batch of pretokenized examples).
I would appreciate if someone could tell me how to run the above code for a text column in pandas dataframe and save the summary in another column?