Trying to understand the task-specific head for diff. models + Transformers AutoModel

DanRKan · April 20, 2023, 11:21am

Hi! i’m new to HuggingFace and I have to use the models to get embeddings from text. When I was exploring the code and model checkpoints to do so, I came across this code and was confused. Please let me know if I’m thinking in the right direction.

Understanding the difference between CLM heads of different models in the given docs:
In the code mentioned in the example of XLM-R docs, "roberta-base" checkpoint is of monolingual Roberta. The Roberta model is getting loaded from that with a CLM head on top, but the understanding of the base model is in only 1 single language.

Is this model the same as that loaded by RobertaForCausalLM.from_pretrained("roberta-base", config=config) ? Further in this, is the CLM head for RobertaForCausalLM same as XLMRobertaForCausalLM?

Why is there a need for two different classes such as RobertaForCausalLM() and XLMRobertaForCausalLM() if we can do that using AutoModelForCausalLM() and load the models(both pretrained and from their configs)?

Topic		Replies	Views
Cant get model jjzha/esco-xlm-roberta-large to run correctly Models	0	204	August 24, 2023
Does XLM-R follows RoBERTa or XLM for MLM? Models	0	403	June 13, 2022
Cannot replicate xlm-roberta-large-xnli Results Models	0	496	September 2, 2021
Shouldn't RobertaForCausalLM generate something? 🤗Transformers	8	1425	April 11, 2024
XLM-Roberta Flax 🤗Transformers	0	294	December 2, 2021

Trying to understand the task-specific head for diff. models + Transformers AutoModel

Related topics