Instruction fine tuning in plain pytorch

Hi everyone!

I would like to instruction fine tune a “Zephyr-7B” and try to make it hallucinate a little less (as a generative part of a RAG pipline), respectively that it rather answers with “I don’t know” if a context is not given instead of making things up.

However, I would like to do the fine tuning in a plain pytorch loop. Would I then have to prepare my data as normal as for causal language modeling?
So that my input is then my prompt (context, question and answer) and my labels are the same as my input. (transfomers then automatically shifts the label by one token)

In short, is instruction fine tuning the same as causal language modeling except that I train my model to generate text that follows my instruction template?

Thanks in advance for any help!

Kind regards
Christopher

EDIT: I think I probably found the solution here.