Training CausalLM to imitate Seq2SeqModel

rdyzakya · April 7, 2023, 9:34am

I want to train a causal language model to imitate a sequence to sequence model. To imitate the sequence to sequence model, I need to only take the trailing output text after the input text.

Let say for example,

input_text = "The food is spicy. From the previous text, it is inferred that the text's sentiment is:"
output_text = "Negative"

# The food is spicy. From the previous text, it is inferred that the text's sentiment is: Negative
tokenized_inputs = tokenizer(input_text + ' ' + output_text,**tokenize_args)

This will be a problem when I need to evaluate and compute the metrics of my model in each epoch, since the ‘DataCollatorForLanguageModel’ will copy the input text to the label. I want it to be, when the evaluation phase, the model input is just the input_text and the label is the output_text. Is there any way I can make this happen?

vipulAdbe · September 28, 2024, 8:21am

Hi, thank you for posting the question. I am facing a similar problem and wondering if it makes sense to use a CausalLM to imitate Seq2SeqLM. In general, how are people training for Seq2SeqLM, as the newer models like MS Phi 3 are based on Causal LM architecture.

imkevinkuo · October 10, 2024, 1:09pm

The TOFU repository GitHub - locuslab/tofu: Landing Page for TOFU has an example of this.
They call a CausalLM model with a labels argument and set all the labels corresponding to input positions to -100. If you must pass in the same tensor as the input and label, you could also use the output logits. You would have to subset the positions corresponding to the output tokens only and then manually compute the loss.

Topic		Replies	Views
Training a CausalLM from scratch for a machine translation task Models	3	80	January 10, 2025
Can you fine tune a CausalLM model (GPT2) to seq2seq, redefining the architecture or do I need to retrain the model from scratch? 🤗Transformers	0	344	February 28, 2024
Train tokenizer for seq2seq model 🤗Tokenizers	0	340	April 19, 2024
Train a CausalLM for machine translation Beginners	1	133	January 1, 2025
Generate desired text output based on model training Intermediate	3	302	December 17, 2024

Training CausalLM to imitate Seq2SeqModel

Related topics