So I’ve been using “Parrot Paraphraser”, however, I wanted to try Pegasus and compare results.
I’m scraping articles from news websites & splitting them into sentences then running each individual sentence through the Paraphraser, however, Pegasus is giving me the following error:
File "C:\Python\lib\site-packages\torch\nn\functional.py", line 2044, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self
The code:
def get_response(input_text,num_return_sequences):
batch = tokenizer.prepare_seq2seq_batch([input_text],truncation=True,padding='longest',max_length=1024, return_tensors="pt").to(torch_device)
translated = model.generate(**batch,max_length=1024,num_beams=10, num_return_sequences=num_return_sequences, temperature=1.5)
tgt_text = tokenizer.batch_decode(translated, skip_special_tokens=True)
return tgt_text
for phrase in introduction_list:
if len(phrase) > 15 and 'http' not in phrase and 'read more' not in phrase and 'Read more' not in phrase and phrase not in h2list and len(phrase) < 1024:
para_phrases = get_response(phrase, 1)