How does Llama For Sequence Classification determine what class corresponds to what label?

In other words, how does it know class a/0 is the probability of a specific class, say"cat", and not the “dog” without fine-tuning? Also, how can I let the model know these details? Do I have to add it to the prompt?

You need to fine-tune it according to my understanding. See the following link for one possible way to fine-tune it. Text classification

Where can I read more about LlamaForSequenceClassification? I would like to know how this class uses the original llama checkpoint for classification tasks i.e., how the computation happens behind the scenes so I can decide whether to use it for my application.

LLaMA See the link above for documentation of LlamaForSequenceClassification.

The docs mention it as The LLaMa Model transformer with a sequence classification head on top (linear layer). here. Does it mean only a single linear dense layer is added directly over the outputs of llama or a linear layer added to a llama customized for sequence classification?

What is the difference? I guess only a single linear dense layer is added directly over the outputs of LLaMa.

LlamaForSequenceClassification might fine-tune also the body and not just the sequence classification head.

Hmm, I think the same. I am trying to understand the source code.

Whatever your output is, almost all of the layers from the input to the final hidden layer are the same. Depending on the application of the model you would then either add a sequence classification layer on top of the last hidden layer, or a text generation layer, or some other type of layer depending on how you’re intending to use the entire model. The thing that makes it a Llama model is partly the architecture but also all of the weights up to the final hidden layer, and then all of the weights for all of the different final output layers.
So if you have a Llama model with a sequence classifier on top, it means that all of the layers have been trained by the Llama team, and then they’ve added a Classifier head on top as the output layer, and then trained that on some arbitrary classifications to give it some baseline weights. You would then fine tune the whole model to train it to correctly apply your classifications to your input data. Of course you can use techniques like LoRA to make the training more efficient, but the point is to give you a good baseline to perform fine tuning on.
I hope that helps better your understanding.