They don't remember what you tell them... do they?

Hi there, fairly basic but important question, just looking for confirmation that models are not affected by anything you give them at inference time. Learning is done at pre-training and fine-tuning stages, but a model is effectively read-only once you start using it to generate text. Do I understand this correctly?

As I recall from walking through the Fine-tune a pretrained model example, all your work is lost, unless you save the updated model. If you used a model for inference for a few hours and then saved it, would it be any different?

Just a little context, I want to use data in a RAG app that should not end up affecting the model in any way, and, say, end up showing up in later conversations. That is, unless I intentionally implement a “memory” mechanism (which is just passing the conversation history back to the model every time). So for me, this “forgetfulness” of the model is a feature I’m counting on. Yes, I’ve probably watched too many episodes of Westworld.

One last question, is this forgetfulness behavior generally considered a feature or a bug. In other words, is this a behavior that might change in the future? Models that learn on the fly?

Thanks so much.

Hi there, fairly basic but important question, just looking for confirmation that models are not affected by anything you give them at inference time. Learning is done at pre-training and fine-tuning stages, but a model is effectively read-only once you start using it to generate text. Do I understand this correctly?

Correct, nothing whatsoever about the model changes when you’re using it for inference.

As I recall from walking through the Fine-tune a pretrained model example, all your work is lost, unless you save the updated model. If you used a model for inference for a few hours and then saved it, would it be any different?

No, it would just be a copy of what you started with

Unless you, say, you started with a model in 32 bits and you loaded it as 16 bits and then saved the 16 bit version–one could change the model doing things like that. But nothing about running inference in and of itself changes it at all.

One last question, is this forgetfulness behavior generally considered a feature or a bug. In other words, is this a behavior that might change in the future? Models that learn on the fly?

Depending on your needs, it could be a limitation, but I wouldn’t say it’s a bug. Transformer models are giant functions. If you have a tiny regression model y = mx + b, it’s a function with learned parameters m and b. Nothing about plugging in some value x will change m or b and in the vast majority of cases, you don’t want m and b to change on the fly. Transformers are a massively scaled up and more complicated version of this, but it’s the same basic idea at the end of the day.

But there is a sub-field of AI called “Online Learning” and many people/companies have created systems with AI models that learn on the fly. But in this case they’re going out of their way to update the model’s parameters on the fly, the model isn’t just changing in and of itself. And it’s more complicated than the classic “train and then serve” paradigm where the model is fixed and it’s probably not what most people want by default.

This could change in the future I guess - you could have more human-like models that are stateful and do change and learn from everything they see, who knows.

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.