I’m trying to run prediction using CodeParrot. I’d like to use generate()
because pipeline
is too high-level but __call__
is too low-level:
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
tokenizer = AutoTokenizer.from_pretrained("codeparrot/codeparrot")
model = AutoModelForCausalLM.from_pretrained("codeparrot/codeparrot")
config = GenerationConfig(
max_new_tokens=50
)
inputs = tokenizer("def hello_world():", return_tensors="pt")
outputs = model.generate(
inputs=inputs.input_ids,
generation_config=config
)
print(tokenizer.decode(outputs[0]))
but I get:
ValueError: If `eos_token_id` is defined, make sure that `pad_token_id` is defined.
This makes no sense to me. Isn’t generate()
supposed to basically do the work for me? I can run inference manually with this model without having to change model details like the vocabulary… so how come generate()
isn’t able to figure it out?