Token-by-Token Fine-Tuning of the phi-2 Model for code generation

Hi! I’m working on fine-tuning the phi-2 model using supervised fine-tuning, but I’ve run into a problem. My goal is to update the model’s knowledge about Pandas functions.

Here’s an example of the prompt I’m using: Prompt:

“Section: Series Subsection: string-handling How to: Remove a prefix from an object series? Answer:”

The corresponding label is:

“pandas.Series.str.removeprefix”

What I’d like to do is update the model after each token generation. For example, I would provide the model with the prompt, and it would generate the next token (e.g., ‘pan’). Then, I’d compute the loss and backpropagate to update the model.

My questions are:

  • Does this process make sense for fine-tuning?
  • If so, is there a Trainer class from Hugging Face that supports this kind of token-by-token fine-tuning? From what I understand, the default Trainer workflow might work a bit differently than what I described.
  • Maybe you have any general tips for this type of task, please share.