Deployed GPT-2 models vs "Model Card" question

Hi, I’ve noticed a difference in performance between a GPT-2 trained conversational chat bot deployed as a Discord chat bot, and it’s “Model Card” page. (Not sure if that is the correct term)

When you repeat input (eg, “Hi”) Barney says something different every time. In the Discord version, he replies the same way every time.

Ideally the Discord bot would behave the same way as his Model Card. Thoughts? THANKS!!! :pray:

Hello :wave:

The inference widget and your bot in discord might be using different temperature and sampling parameters (here’s a great blog post if you’re interested btw), at least this is my guess.