I’ve been using BART to summarize, and I have noticed some of the outputs resembling paraphrases.
Is there a way for me to build on this, and use the model for paraphrasing primarily?
from transformers import BartTokenizer, BartForConditionalGeneration, BartConfig
import torch
model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')
device = torch.device('cpu')
text = "At the core of the United States' mismanagement of the Coronavirus lies its distrust of science"
preprocess_text = text.strip().replace("\n","")
t5_prepared_Text = "summarize: "+preprocess_text
print ("original text preprocessed: \n", preprocess_text)
tokenized_text = tokenizer.encode(t5_prepared_Text, return_tensors="pt").to(device)
summary_ids = model.generate(tokenized_text,
num_beams=10,
no_repeat_ngram_size=1,
min_length=10,
num_return_sequences = 2,
max_length=20,
top_k = 100,
early_stopping=True)
output = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
output1 = tokenizer.decode(summary_ids[1], skip_special_tokens=True)
Summarized Text: The United States' mismanagement of the Coronavirus is rooted in its distrust of science.
I’d like to note that when I do “num_return_sequences” the answers are the same. That makes sense, but is there a way for me to get separate answers? I don’t believe seed to be built-in with BART.