How does Llama For Sequence Classification determine what class corresponds to what label?

VasudevaK · June 18, 2024, 11:36am

In other words, how does it know class a/0 is the probability of a specific class, say"cat", and not the “dog” without fine-tuning? Also, how can I let the model know these details? Do I have to add it to the prompt?

MattiLinnanvuori · June 18, 2024, 3:36pm

You need to fine-tune it according to my understanding. See the following link for one possible way to fine-tune it. Text classification

VasudevaK · June 19, 2024, 3:14am

Where can I read more about LlamaForSequenceClassification? I would like to know how this class uses the original llama checkpoint for classification tasks i.e., how the computation happens behind the scenes so I can decide whether to use it for my application.

MattiLinnanvuori · June 19, 2024, 7:48am

LLaMA See the link above for documentation of LlamaForSequenceClassification.

VasudevaK · June 21, 2024, 7:01am

The docs mention it as The LLaMa Model transformer with a sequence classification head on top (linear layer). here. Does it mean only a single linear dense layer is added directly over the outputs of llama or a linear layer added to a llama customized for sequence classification?

MattiLinnanvuori · June 21, 2024, 11:19am

What is the difference? I guess only a single linear dense layer is added directly over the outputs of LLaMa.

MattiLinnanvuori · June 22, 2024, 12:59pm

LlamaForSequenceClassification might fine-tune also the body and not just the sequence classification head.

VasudevaK · June 22, 2024, 2:23pm

Hmm, I think the same. I am trying to understand the source code.

fasterinnerlooper · June 23, 2024, 11:53pm

Whatever your output is, almost all of the layers from the input to the final hidden layer are the same. Depending on the application of the model you would then either add a sequence classification layer on top of the last hidden layer, or a text generation layer, or some other type of layer depending on how you’re intending to use the entire model. The thing that makes it a Llama model is partly the architecture but also all of the weights up to the final hidden layer, and then all of the weights for all of the different final output layers.
So if you have a Llama model with a sequence classifier on top, it means that all of the layers have been trained by the Llama team, and then they’ve added a Classifier head on top as the output layer, and then trained that on some arbitrary classifications to give it some baseline weights. You would then fine tune the whole model to train it to correctly apply your classifications to your input data. Of course you can use techniques like LoRA to make the training more efficient, but the point is to give you a good baseline to perform fine tuning on.
I hope that helps better your understanding.

anushkrishna · July 16, 2024, 1:16pm

If you are not satisfied with a single layer you can make your own classification head

phuongnm · May 25, 2025, 6:41pm

They mentioned that they used the last token for sequence representation and used it for classification, similar to other decoder-only models such as GPT2.

Topic		Replies	Views
LlamaForSequenceClassification class and its results? Beginners	1	280	October 15, 2024
CodeLama LlamaForSequenceClassification Intermediate	0	362	October 16, 2023
Finetuning llama for classification 🤗Transformers	2	1018	January 21, 2025
Fine-tuned Llama2 model for text classification generating new instances Beginners	0	1410	February 26, 2024
Can I use "AutoModel For Sequence Classification" class for generative models? 🤗Transformers	2	743	April 15, 2024

How does Llama For Sequence Classification determine what class corresponds to what label?

Related topics