I can’t figure out the correct way to update the config/ generation config parameters for transformers.pipeline (temperature etc, max_new_tokens, torch_dtype and device_map)
from transformers import pipeline
pipe = pipeline(
'text-generation',
model = hf_model_id,
temperature = 0.1,
max_new_tokens=30,
torch_dtype='auto',
device_map="auto")
pipe(prompt)
So if I just use the arguments in pipeline, I get UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
But I try:
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, pipeline
hf_model_id = "eachadea/vicuna-7b-1.1"
model = AutoModelForCausalLM.from_pretrained(hf_model_id)
tokenizer = AutoTokenizer.from_pretrained(hf_model_id, legacy=False)
generation_config, unused_kwargs = GenerationConfig.from_pretrained(
hf_model_id, max_new_tokens=200, temperature=0.1, return_unused_kwargs=True
)
model.generation_config = generation_config
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
)
pipe(prompt)
And it doesn’t usually throw the warning, but it uses the default config instead of whatever I put in model.generation_config (for example, instead of 200 max_new_tokens, it just uses 20 max_tokens)