Shouldn't RobertaForCausalLM generate something?

from transformers import RobertaTokenizer, RobertaForCausalLM, RobertaConfig
import torch

tokenizer = RobertaTokenizer.from_pretrained("roberta-base")
config = RobertaConfig.from_pretrained("roberta-base")
config.is_decoder = True
model = RobertaForCausalLM.from_pretrained("roberta-base", config=config)

input_ids = tokenizer("The sun is", return_tensors="pt").input_ids
# generate up to 30 tokens
outputs = model.generate(input_ids, do_sample=False, max_length=30)
tokenizer.batch_decode(outputs, skip_special_tokens=True)

why does RobertaForCasualLM not generate something?

Hi @dinhanhx,

I don’t think roberta-base has a CLM Head on top. If you display the model.config you will find a ‘RobertaForMaskedLM’ head.

Models suited for text generation can be found here. For instance, this will work fine:

from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig
import torch

tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
config = AutoConfig.from_pretrained("distilgpt2")
config.is_decoder = True
model = AutoModelForCausalLM.from_pretrained("distilgpt2", config=config)

input_ids = tokenizer("The sun is shining while", return_tensors="pt").input_ids
# generate up to 30 tokens
outputs = model.generate(input_ids, do_sample=True, max_length=30)
tokenizer.batch_decode(outputs, skip_special_tokens=True)```

Hi, RobertaForCausalLM (and related classes like BertForCausalLM) are only meant to be used as decoder for the composite EncoderDecoderModel, VisionEncoderDecoderModel and SpeechEncoderDecoderModel classes.

from transformers import AutoTokenizer, EncoderDecoderModel

encoder_tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
decoder_tokenizer = AutoTokenizer.from_pretrained("roberta-base")
model = EncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-uncased", "roberta-base")

input_ids = encoder_tokenizer("The sun is", return_tensors="pt").input_ids
# generate up to 30 tokens
outputs = model.generate(input_ids, do_sample=False, max_length=30)
decoder_tokenizer.batch_decode(outputs, skip_special_tokens=True)

This will instantiate a RobertaForCausalLM as decoder:

print(type(model.decoder))

Thanks. I’m totally aware of other CasualLM models (distilgpt2, gpt2…). However I’m just interested in the case of RobertaForCasualLM only. Do you know any pretrained model that use this model architecture only like the way of gpt2?

Thanks. Do you think it’s possible to train only RobertaForCasualLM on casual language modelling task in the same manner of pretraining GPT2?

Actually, the model can be used just like GPT-2. The reason you weren’t getting any results was because skip_special_tokens=True was passed to the batch_decode method.

As can be seen, the model does generate text, it however does generate special tokens (which is expected as the model needs to be fine-tuned on a downstream dataset):

from transformers import RobertaTokenizer, RobertaForCausalLM, RobertaConfig
import torch

tokenizer = RobertaTokenizer.from_pretrained("roberta-base")
config = RobertaConfig.from_pretrained("roberta-base")
config.is_decoder = True
model = RobertaForCausalLM.from_pretrained("roberta-base", config=config)

input_ids = tokenizer("The sun is", return_tensors="pt").input_ids
# generate up to 30 tokens
outputs = model.generate(input_ids, do_sample=False, max_length=30)
tokenizer.batch_decode(outputs)

returns:

['<s>The sun is</s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s>']
1 Like

So it’s possible. Thanks for confirmation.