Trying to understand the task-specific head for diff. models + Transformers AutoModel

Hi! i’m new to HuggingFace and I have to use the models to get embeddings from text. When I was exploring the code and model checkpoints to do so, I came across this code and was confused. Please let me know if I’m thinking in the right direction.

  1. Understanding the difference between CLM heads of different models in the given docs:
    In the code mentioned in the example of XLM-R docs, "roberta-base" checkpoint is of monolingual Roberta. The Roberta model is getting loaded from that with a CLM head on top, but the understanding of the base model is in only 1 single language.

Is this model the same as that loaded by RobertaForCausalLM.from_pretrained("roberta-base", config=config) ? Further in this, is the CLM head for RobertaForCausalLM same as XLMRobertaForCausalLM?

  1. Why is there a need for two different classes such as RobertaForCausalLM() and XLMRobertaForCausalLM() if we can do that using AutoModelForCausalLM() and load the models(both pretrained and from their configs)?