Training CausalLM to imitate Seq2SeqModel

I want to train a causal language model to imitate a sequence to sequence model. To imitate the sequence to sequence model, I need to only take the trailing output text after the input text.

Let say for example,

input_text = "The food is spicy. From the previous text, it is inferred that the text's sentiment is:"
output_text = "Negative"

# The food is spicy. From the previous text, it is inferred that the text's sentiment is: Negative
tokenized_inputs = tokenizer(input_text + ' ' + output_text,**tokenize_args)

This will be a problem when I need to evaluate and compute the metrics of my model in each epoch, since the ‘DataCollatorForLanguageModel’ will copy the input text to the label. I want it to be, when the evaluation phase, the model input is just the input_text and the label is the output_text. Is there any way I can make this happen?

3 Likes

Hi, thank you for posting the question. I am facing a similar problem and wondering if it makes sense to use a CausalLM to imitate Seq2SeqLM. In general, how are people training for Seq2SeqLM, as the newer models like MS Phi 3 are based on Causal LM architecture.

1 Like

The TOFU repository GitHub - locuslab/tofu: Landing Page for TOFU has an example of this.
They call a CausalLM model with a labels argument and set all the labels corresponding to input positions to -100. If you must pass in the same tensor as the input and label, you could also use the output logits. You would have to subset the positions corresponding to the output tokens only and then manually compute the loss.

1 Like