Training large language models to consider two texts to generate output text

I dataset with following columns:

{text-in, context, text-out}

I want to train large language model like say GPT2 or T5 to learn generating text-2 based on input comprising of text-in and context. How do I do this? Will it work if I simply feed text-in and contextto model during fine tuning with some separator (say “<text-in>: text-in, <context>: context”)?