Shouldn't RobertaForCausalLM generate something?

from transformers import RobertaTokenizer, RobertaForCausalLM, RobertaConfig
import torch

tokenizer = RobertaTokenizer.from_pretrained("roberta-base")
config = RobertaConfig.from_pretrained("roberta-base")
config.is_decoder = True
model = RobertaForCausalLM.from_pretrained("roberta-base", config=config)

input_ids = tokenizer("The sun is", return_tensors="pt").input_ids
# generate up to 30 tokens
outputs = model.generate(input_ids, do_sample=False, max_length=30)
tokenizer.batch_decode(outputs, skip_special_tokens=True)

why does RobertaForCasualLM not generate something?

Hi @dinhanhx,

I don’t think roberta-base has a CLM Head on top. If you display the model.config you will find a ‘RobertaForMaskedLM’ head.

Models suited for text generation can be found here. For instance, this will work fine:

from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig
import torch

tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
config = AutoConfig.from_pretrained("distilgpt2")
config.is_decoder = True
model = AutoModelForCausalLM.from_pretrained("distilgpt2", config=config)

input_ids = tokenizer("The sun is shining while", return_tensors="pt").input_ids
# generate up to 30 tokens
outputs = model.generate(input_ids, do_sample=True, max_length=30)
tokenizer.batch_decode(outputs, skip_special_tokens=True)```

Hi, RobertaForCausalLM (and related classes like BertForCausalLM) are only meant to be used as decoder for the composite EncoderDecoderModel, VisionEncoderDecoderModel and SpeechEncoderDecoderModel classes.

from transformers import AutoTokenizer, EncoderDecoderModel

encoder_tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
decoder_tokenizer = AutoTokenizer.from_pretrained("roberta-base")
model = EncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-uncased", "roberta-base")

input_ids = encoder_tokenizer("The sun is", return_tensors="pt").input_ids
# generate up to 30 tokens
outputs = model.generate(input_ids, do_sample=False, max_length=30)
decoder_tokenizer.batch_decode(outputs, skip_special_tokens=True)

This will instantiate a RobertaForCausalLM as decoder:

print(type(model.decoder))

Thanks. I’m totally aware of other CasualLM models (distilgpt2, gpt2…). However I’m just interested in the case of RobertaForCasualLM only. Do you know any pretrained model that use this model architecture only like the way of gpt2?

Thanks. Do you think it’s possible to train only RobertaForCasualLM on casual language modelling task in the same manner of pretraining GPT2?

Actually, the model can be used just like GPT-2. The reason you weren’t getting any results was because skip_special_tokens=True was passed to the batch_decode method.

As can be seen, the model does generate text, it however does generate special tokens (which is expected as the model needs to be fine-tuned on a downstream dataset):

from transformers import RobertaTokenizer, RobertaForCausalLM, RobertaConfig
import torch

tokenizer = RobertaTokenizer.from_pretrained("roberta-base")
config = RobertaConfig.from_pretrained("roberta-base")
config.is_decoder = True
model = RobertaForCausalLM.from_pretrained("roberta-base", config=config)

input_ids = tokenizer("The sun is", return_tensors="pt").input_ids
# generate up to 30 tokens
outputs = model.generate(input_ids, do_sample=False, max_length=30)
tokenizer.batch_decode(outputs)

returns:

['<s>The sun is</s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s>']
1 Like

So it’s possible. Thanks for confirmation.

Training the Roberta does not help. It still generates one token i.e. </s>

from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM

dataset = load_dataset("lucasmccabe-lmi/CodeAlpaca-20k", split="train")

model_name = "deepset/tinyroberta-squad2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

def formatting_prompts_func(example):
    output_texts = []
    for i in range(len(example['instruction'])):
        text = f"### Question: {example['instruction'][i]}\n ### Answer: {example['output'][i]}"
        output_texts.append(text)
    return output_texts

response_template = " ### Answer:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)
trainer = SFTTrainer(
    model,
    train_dataset=dataset,
    formatting_func=formatting_prompts_func,
    data_collator=collator,
)

trainer.train()
trainer.save_model("./trained/") 
example = pd.DataFrame(dataset).head(n=10).iloc[0]
text = f"### Question: {example['instruction']}"

inputs = tokenizer.encode(text, return_tensors="pt")
outputs = model_trained.generate(inputs,max_new_tokens=256)
print(tokenizer.decode(outputs[0]))

This just adds one token </s> at the end.

<s>### Question: Create a function that takes a specific input and produces a specific output using any mathematical operators. Write corresponding code in Python.</s></s>

Hi,

Training a RoBERTa model from “deepset/tinyroberta-squad2” does not make a lot of sense, since that model is an encoder-only model trained for extractive question answering. Hence fine-tuning it for a different task (text generation in this case) would not result in great results. You could further fine-tune it to perform extractive question answering on a different dataset, but fine-tuning for text generation is not recommended.

Rather, one typically takes a pre-trained decoder-only LLM (which has been pre-trained for text generation already), which can then be further fine-tuned. An example is taking openai-community/gpt2 ¡ Hugging Face and further fine-tuning it for a certain text generation task.