They don't remember what you tell them... do they?

skehlet · April 17, 2024, 4:34pm

Hi there, fairly basic but important question, just looking for confirmation that models are not affected by anything you give them at inference time. Learning is done at pre-training and fine-tuning stages, but a model is effectively read-only once you start using it to generate text. Do I understand this correctly?

As I recall from walking through the Fine-tune a pretrained model example, all your work is lost, unless you save the updated model. If you used a model for inference for a few hours and then saved it, would it be any different?

Just a little context, I want to use data in a RAG app that should not end up affecting the model in any way, and, say, end up showing up in later conversations. That is, unless I intentionally implement a “memory” mechanism (which is just passing the conversation history back to the model every time). So for me, this “forgetfulness” of the model is a feature I’m counting on. Yes, I’ve probably watched too many episodes of Westworld.

One last question, is this forgetfulness behavior generally considered a feature or a bug. In other words, is this a behavior that might change in the future? Models that learn on the fly?

Thanks so much.

dblakely · April 17, 2024, 8:18pm

Hi there, fairly basic but important question, just looking for confirmation that models are not affected by anything you give them at inference time. Learning is done at pre-training and fine-tuning stages, but a model is effectively read-only once you start using it to generate text. Do I understand this correctly?

Correct, nothing whatsoever about the model changes when you’re using it for inference.

As I recall from walking through the Fine-tune a pretrained model example, all your work is lost, unless you save the updated model. If you used a model for inference for a few hours and then saved it, would it be any different?

No, it would just be a copy of what you started with

Unless you, say, you started with a model in 32 bits and you loaded it as 16 bits and then saved the 16 bit version–one could change the model doing things like that. But nothing about running inference in and of itself changes it at all.

One last question, is this forgetfulness behavior generally considered a feature or a bug. In other words, is this a behavior that might change in the future? Models that learn on the fly?

Depending on your needs, it could be a limitation, but I wouldn’t say it’s a bug. Transformer models are giant functions. If you have a tiny regression model y = mx + b, it’s a function with learned parameters m and b. Nothing about plugging in some value x will change m or b and in the vast majority of cases, you don’t want m and b to change on the fly. Transformers are a massively scaled up and more complicated version of this, but it’s the same basic idea at the end of the day.

But there is a sub-field of AI called “Online Learning” and many people/companies have created systems with AI models that learn on the fly. But in this case they’re going out of their way to update the model’s parameters on the fly, the model isn’t just changing in and of itself. And it’s more complicated than the classic “train and then serve” paradigm where the model is fixed and it’s probably not what most people want by default.

This could change in the future I guess - you could have more human-like models that are stateful and do change and learn from everything they see, who knows.

system · April 18, 2024, 8:19am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Training causal LM from scratch - forcing prompt during training Beginners	0	286	February 11, 2022
Fine-Tuning + RAG based Chatbot: Dataset Structure & Instruction Adherence Issues Intermediate	7	381	March 11, 2025
Concept drift in pre-trained models Beginners	0	472	September 13, 2021
Finetune model outputs diffrent predictions at each run ? why? 🤗Transformers	0	369	December 15, 2021
Finetuning on base or instruct model? Beginners	0	1710	April 6, 2024

They don't remember what you tell them... do they?

Related topics