How does the API inference work on models such as Blenderbot?

I assume models like blenderbot need to look at prior inputs and outputs in order to form some consistency. How does the inference API provide that to the model?

Hey, I’m dealing with the same subject.

As far as I understand, there is a way to provide the context of the previous text in the conversation.
The details are here:
https://api-inference.huggingface.co/docs/python/html/detailed_parameters.html#conversational-task

Although, when I tried it on the 1B model, I’m getting the following error:
“Cutting history off because it’s too long (36 > 28) for underlying model”

I don’t know if this is a limitation of the model or the API.
If you find a solution for that I let me know.