Hello
I tried the facebook/blenderbot-3B model using the Hosted Inference API and it works pretty well (facebook/blenderbot-3B · Hugging Face). Now I tried to use it locally with the Python script shown below. The created responses are much worse than from the inference API and do not make sense most of the time.
Is a different code used for the inference API or did I make a mistake?
from transformers import TFAutoModelForCausalLM, AutoTokenizer, BlenderbotTokenizer, TFBlenderbotForConditionalGeneration, TFT5ForConditionalGeneration, BlenderbotTokenizer, BlenderbotForConditionalGeneration
import tensorflow as tf
import torch
device = "cuda:0" if torch.cuda.is_available() else "cpu"
chat_bots = {
'BlenderBot': [BlenderbotTokenizer.from_pretrained("hyunwoongko/blenderbot-9B"), BlenderbotForConditionalGeneration.from_pretrained("hyunwoongko/blenderbot-9B").to(device)],
}
key = 'BlenderBot'
tokenizer, model = chat_bots[key]
for step in range(100):
new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt').to(device)
if step > 0:
bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1)
else:
bot_input_ids = new_user_input_ids
chat_history_ids = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id).to(device)
print("Bot: ", tokenizer.batch_decode(chat_history_ids, skip_special_tokens=True)[0])