Training GPT-type models for classification tasks CausalLM vs SequenceClassification

oqq09 · April 8, 2023, 7:03pm

I see examples like:

Where CausalLM class is used to fine tune the model on the task instead of the SequenceClassification. What is the purpose of doing that?

oqq09 · April 10, 2023, 10:31pm

I would like to add more context to my question and maybe narrow it down to where I am facing a problem fine tuning a similar model (distilGPT2) using a custom dataset.

I am trying to figure out how to fine-tune a GPT-type model using the CausalLM class;
First, the "input_ids" are the encoded form of text in the format below:

f"mnli hypothesis: {hypothesis} premise: {premise} target: {class_label} <|endoftext|>"

Second, the training set is grouped in a similar manner to the group_text function used in the huggingface notebook example for fine-tuning CausalLM models ([Google Colab](https://How to fine-tune a model on language modeling | huggingface.co)) which also creates the "labels" by shifting the input_id by one token.

How should I pre-process the eval set? Do I use the same method as I did on the training set? The concept of grouping the text using the group_text function is making it hard for me to understand how will the model know that I want it to focus on generating the "class_label" when it is getting fine-tuned!

daminho · August 7, 2024, 9:17am

Hi, I’m currently having the same question with you. Could you share your solution?

Topic		Replies	Views
How to label dataset for Causal Language Modeling Beginners	0	530	January 27, 2023
How to separate sequences during finetuning gpt Beginners	0	296	December 19, 2020
How to fine-tune a model for my use-case? Beginners	0	663	July 13, 2023
Data Preparation for CausalLM 🤗Transformers	1	1314	March 16, 2023
How to use CausalLM model to pre-train, and use SequenceClassification model to fine-tune? Beginners	2	722	August 31, 2023

Training GPT-type models for classification tasks CausalLM vs SequenceClassification

Related topics