Hi @mavaron
I don’t think it is possible to pass a chat buffer to a pipline. You could pass the complete history in your prompt or parts of it.
Another possibility would be to create a custom pipline. I have tested this as follows:
from pprint import pprint
from transformers import Pipeline, AutoTokenizer, AutoModelForCausalLM
# chat buffer
buffer = []
# function to embed messages in template format
def embed_message(message, role):
return {
"role": role,
"content": message
}
# custom pipeline
class ChatBufferPipeline(Pipeline):
def _sanitize_parameters(self, **kwargs):
preprocess_kwargs = {}
# lookback on chat
if "lookback" in kwargs:
preprocess_kwargs["lookback"] = kwargs["lookback"]
return preprocess_kwargs, {}, {}
def preprocess(self, prompt, lookback=None):
# initial system message
messages = [
{
"role": "system",
"content": "You are a friendly chatbot who answers user questions. You can use the previous examples if this helps you.",
},
]
# get chat history
if lookback:
buffer_messages = buffer[-(lookback):]
messages += buffer_messages
# embed user message in template format
user_message = embed_message(prompt, "user")
messages.append(user_message)
# add new message to buffer
buffer.append(user_message)
messages = self.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
return self.tokenizer(messages, return_tensors="pt").input_ids.cuda()
def _forward(self, model_inputs):
outputs = self.model.generate(model_inputs, max_new_tokens=250, min_new_tokens=20)
return {"outputs": outputs, "inputs": model_inputs}
def postprocess(self, model_outputs):
outputs = model_outputs["outputs"]
inputs = model_outputs["inputs"]
assistant_output = self.tokenizer.decode(outputs[0][len(inputs[0]):], add_special_tokens=False)
buffer.append(embed_message(assistant_output, "assistant"))
full_dialog = self.tokenizer.decode(outputs[0])
return assistant_output, full_dialog
Now I load my model that I want to have as an assistant and instantiate the pipeline:
# model
model_id = "HuggingFaceH4/zephyr-7b-beta"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.float16)
# pipeline
chatpipe = ChatBufferPipeline(model=model, tokenizer=tokenizer)
The interaction with it looks like this (note that I am currently returning the new message and all the input with lookback as tuple to debug the chat history):
_, dialog = chatpipe("My favorite color is yellow, what is yours?")
pprint(dialog)
--------------------------
'<s> <|system|>\n'
'You are a friendly chatbot who answers user questions. You can use the '
'previous examples if this helps you.</s> \n'
'<|user|>\n'
'My favorite color is yellow, what is yours?</s> \n'
'<|assistant|>\n'
"I don't have a favorite color as I'm not capable of having preferences or "
'feelings. However, my design and interface are primarily blue and green, '
'which are calming and soothing colors that help users feel more relaxed and '
'comfortable while interacting with me.</s></s> \n'
If I then ask another question, my message is in the buffer and the model uses it to answer.
_, dialog = chatpipe("What did I tell you my favorite color was?", lookback=10)
pprint(dialog)
--------------------------
'<s> <|system|>\n'
'You are a friendly chatbot who answers user questions. You can use the '
'previous examples if this helps you.</s> \n'
'<|user|>\n'
'My favorite color is yellow, what is yours?</s> \n'
'<|assistant|>\n'
"I don't have a favorite color as I'm not capable of having preferences or "
'feelings. However, my design and interface are primarily blue and green, '
'which are calming and soothing colors that help users feel more relaxed and '
'comfortable while interacting with me.</s></s> \n'
'<|user|>\n'
'What did I tell you my favorite color was?</s> \n'
'<|assistant|>\n'
'You told me that your favorite color is yellow. Is there anything else I can '
'help you with today?</s>'
That would be my idea of how to generate this behavior, but there may be even simpler ways.
The important thing is that I have created the buffer outside the pipeline in case I want to empty it. And I used a chat template that was given by the model. You can create your own chat templates and add them to the models tokenizer.
The documentation is very helpful for doing that.
I hope that this idea helps you.