I tried the following two things and find a significant difference between pipeline and model.generate to complete sequences.
model_pr = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer.decode(model_pr.generate(**input_tok)[0])
'My name is Merve and my favorite+ and my CR+ and my CR+ and my CR'
(2) Using pipeline to do that same thing
generator = pipeline('text-generation', model='gpt2')
generator(input)
[{'generated_text': 'My name is Merve and my favorite brand is Baskin-Robbins.\n\n"They bring a whole lot of stuff to the table and we have to come up with a new way of making a big deal," explains Jeff.'}]
I get a lot more sensible output for pipeline for some reason. My understanding was that both should have given similar responses.
After digging a bit into the code base, I found that GPT-2 by default uses do_sample=True and max_length=50 when generating text as seen here: config.json · openai-community/gpt2 at main.
Hence to get the equivalent behaviour, one can do the following:
from transformers import pipeline, set_seed, AutoModelForCausalLM, AutoTokenizer
set_seed(42)
pipe = pipeline(model="gpt2")
prompt = "hello world, my name is"
result = pipe(prompt, do_sample=False)["generated_text"]
print(result)
# equivalent to:
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")
inputs = tokenizer(prompt, return_tensors="pt")
generated_ids = model.generate(**inputs, do_sample=False, max_length=50)
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(generated_text)
The pipeline uses the generate() method behind the scenes, but uses some default generation keyword arguments which are documented here. I didn’t get equivalent results when using sampling (despite using `set_seed).