I see examples like:
Fine-tuning GPT-J for text entailment
Where CausalLM class is used to fine tune the model on the task instead of the SequenceClassification. What is the purpose of doing that?
I see examples like:
Fine-tuning GPT-J for text entailment
Where CausalLM class is used to fine tune the model on the task instead of the SequenceClassification. What is the purpose of doing that?
I would like to add more context to my question and maybe narrow it down to where I am facing a problem fine tuning a similar model (distilGPT2) using a custom dataset.
I am trying to figure out how to fine-tune a GPT-type model using the CausalLM class;
First, the "input_ids"
are the encoded form of text in the format below:
f"mnli hypothesis: {hypothesis} premise: {premise} target: {class_label} <|endoftext|>"
Second, the training set is grouped in a similar manner to the group_text
function used in the huggingface notebook example for fine-tuning CausalLM models ([Google Colab](https://How to fine-tune a model on language modeling | huggingface.co)) which also creates the "labels"
by shifting the input_id by one token.
How should I pre-process the eval set? Do I use the same method as I did on the training set? The concept of grouping the text using the group_text
function is making it hard for me to understand how will the model know that I want it to focus on generating the "class_label"
when it is getting fine-tuned!
Hi, I’m currently having the same question with you. Could you share your solution?