Hi! I’m working on fine-tuning the phi-2 model using supervised fine-tuning, but I’ve run into a problem. My goal is to update the model’s knowledge about Pandas functions.
Here’s an example of the prompt I’m using: Prompt:
“Section: Series Subsection: string-handling How to: Remove a prefix from an object series? Answer:”
The corresponding label is:
“pandas.Series.str.removeprefix”
What I’d like to do is update the model after each token generation. For example, I would provide the model with the prompt, and it would generate the next token (e.g., ‘pan’). Then, I’d compute the loss and backpropagate to update the model.
My questions are:
- Does this process make sense for fine-tuning?
- If so, is there a Trainer class from Hugging Face that supports this kind of token-by-token fine-tuning? From what I understand, the default Trainer workflow might work a bit differently than what I described.
- Maybe you have any general tips for this type of task, please share.