Hello,
So I tested both recently and found a very peculiar behavior under similar parameter values. This was using Galactica’s 1.3B variant
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, set_seed
import torch
checkpoint = "facebook/galactica-1.3b"
tokenizer = AutoTokenizer.from_pretrained(checkpoint, padding_side="left")
model = AutoModelForCausalLM.from_pretrained(checkpoint)
model.to('cuda')
generator = pipeline('text-generation', model=model, tokenizer=tokenizer, device=0)
#With pipeline
set_seed(42)
generator(['Is this', 'What is the matter'], renormalize_logits=True, do_sample=True, use_cache=True, max_new_tokens=10)
#With model.generate()
device=torch.device('cuda',0)
model.to(device)
tokenizer = AutoTokenizer.from_pretrained(checkpoint, padding_side="left")
tokenizer.pad_token = tokenizer.eos_token = '<pad>'
tokenized_prompts = tokenizer(['Is this', 'What is the matter'], padding=True, return_tensors='pt')
set_seed(42)
model_op = model.generate(input_ids=tokenized_prompts['input_ids'].to(device),
attention_mask=tokenized_prompts['attention_mask'].to(device),
renormalize_logits=False, do_sample=True,
use_cache=True, max_new_tokens=10)
tokenizer.batch_decode(model_op, skip_special_tokens=True)
Here is the result with each,
[{'generated_text': 'Is this method for dealing with multiple objects?\n\n\n'}],
[{'generated_text': 'What is the matter density of a star whose radius is equal to '}]
................
['Is this method for dealing with multiple objects?\n\n\n',
'What is the matter of this, I know that it isn’t']
As we can see, both methods are producing different outputs, even under the same settings. However, the first generation for each method seems to be the same & I tried it for a bunch of other prompts. That being said if we turn off do_sample i.e.
do_sample = False (greedy decoding)
then, we get the same results. Thus, I believe this is related to the sampling method being employed which is producing different results. Does anyone have any thoughts on this?