Training GPT-type models for classification tasks CausalLM vs SequenceClassification

I see examples like:

Fine-tuning GPT-J for text entailment

Where CausalLM class is used to fine tune the model on the task instead of the SequenceClassification. What is the purpose of doing that?

I would like to add more context to my question and maybe narrow it down to where I am facing a problem fine tuning a similar model (distilGPT2) using a custom dataset.

I am trying to figure out how to fine-tune a GPT-type model using the CausalLM class;
First, the "input_ids" are the encoded form of text in the format below:

f"mnli hypothesis: {hypothesis} premise: {premise} target: {class_label} <|endoftext|>"

Second, the training set is grouped in a similar manner to the group_text function used in the huggingface notebook example for fine-tuning CausalLM models ([Google Colab](https://How to fine-tune a model on language modeling | which also creates the "labels" by shifting the input_id by one token.

How should I pre-process the eval set? Do I use the same method as I did on the training set? The concept of grouping the text using the group_text function is making it hard for me to understand how will the model know that I want it to focus on generating the "class_label" when it is getting fine-tuned!